[Cria][Lllama runner] Use caching temp allocator

kimishpatel · kimishpatel · commit 9e914121d3f7 · 2025-12-04T07:46:11.000-08:00
Use of caching allocator improves TITO model performance by 6+ %. Will add repro instructions here but requires next diff to see the impact Differential Revision: [D85532078](https://our.internmc.facebook.com/intern/diff/D85532078/) ghstack-source-id: 327095993 Pull Request resolved: #16080
diff --git a/extension/llm/runner/llm_runner_helper.cpp b/extension/llm/runner/llm_runner_helper.cpp
@@ -225,7 +225,6 @@ std::unique_ptr<TextLLMRunner> create_text_llm_runner(
             max_cached_memory_size_bytes_));
   } else {
     module = std::make_unique<Module>(
-        model_path,
         model_path,
         Module::LoadMode::File,
         std::move(event_tracer), // event tracer