Commit 6b3cb43

committed

Update base for Update on "Reduce allocation overhead in quantized sdpa"

For small models dequantizing portions of v cache causes extra alloc overhead. Probably a better way to handle this is to dequantize entire v cache outside the model There isnt significant perf advantage from this yet but subsequent diffs will use caching allocator where this refactor help. Differential Revision: [D85532077](https://our.internmc.facebook.com/intern/diff/D85532077/) [ghstack-poisoned]

1 parent b11ca01 commit 6b3cb43Copy full SHA for 6b3cb43

0 file changed

-0

lines changed

0 file changed

-0

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 6b3cb43

0 file changed

0 file changed

File tree

0 file changed

0 file changed

0 commit comments