Commit 6b3cb43
committed
Update base for Update on "Reduce allocation overhead in quantized sdpa"
For small models dequantizing portions of v cache causes extra alloc overhead.
Probably a better way to handle this is to dequantize entire v cache outside the model
There isnt significant perf advantage from this yet but subsequent diffs will use caching allocator where this refactor help.
Differential Revision: [D85532077](https://our.internmc.facebook.com/intern/diff/D85532077/)
[ghstack-poisoned]1 parent b11ca01 commit 6b3cb43
File tree
0 file changed
+0
-0
lines changed0 file changed
+0
-0
lines changed
0 commit comments