Skip to content

Commit 0bfcb4b

Browse files
committed
Update base for Update on "Reduce allocation overhead in quantized sdpa"
For small models dequantizing portions of v cache causes extra alloc overhead. Probably a better way to handle this is to dequantize entire v cache outside the model There isnt significant perf advantage from this yet but subsequent diffs will use caching allocator where this refactor help. Differential Revision: [D85532077](https://our.internmc.facebook.com/intern/diff/D85532077/) [ghstack-poisoned]
2 parents 128ee29 + 350ea3c commit 0bfcb4b

File tree

16 files changed

+86
-1361
lines changed

16 files changed

+86
-1361
lines changed

.github/workflows/android-perf-private-device-experiment.yml

Lines changed: 0 additions & 62 deletions
This file was deleted.

.github/workflows/android-perf.yml

Lines changed: 0 additions & 562 deletions
This file was deleted.

.github/workflows/apple-perf-private-device-experiment.yml

Lines changed: 0 additions & 62 deletions
This file was deleted.

0 commit comments

Comments
 (0)