You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update base for Update on "Reduce allocation overhead in quantized sdpa"
For small models dequantizing portions of v cache causes extra alloc overhead.
Probably a better way to handle this is to dequantize entire v cache outside the model
There isnt significant perf advantage from this yet but subsequent diffs will use caching allocator where this refactor help.
Differential Revision: [D85532077](https://our.internmc.facebook.com/intern/diff/D85532077/)
[ghstack-poisoned]
0 commit comments