Commit 0bfcb4b
committed
Update base for Update on "Reduce allocation overhead in quantized sdpa"
For small models dequantizing portions of v cache causes extra alloc overhead.
Probably a better way to handle this is to dequantize entire v cache outside the model
There isnt significant perf advantage from this yet but subsequent diffs will use caching allocator where this refactor help.
Differential Revision: [D85532077](https://our.internmc.facebook.com/intern/diff/D85532077/)
[ghstack-poisoned]File tree
16 files changed
+86
-1361
lines changed- .github/workflows
- backends/qualcomm/scripts
- examples/models/llama
- exir/program
- extension
- llm/export
- pybindings
- tools/cmake/preset
16 files changed
+86
-1361
lines changedLines changed: 0 additions & 62 deletions
This file was deleted.
This file was deleted.
Lines changed: 0 additions & 62 deletions
This file was deleted.
0 commit comments