Skip to content

Commit 79bcbc5

Browse files
committed
Update base for Update on "[Executorch] Use temp allocator for allocating scratch memory"
This allows us to leverage temp memory allocator and if that allocator is caching allocator it reduces the allocaiton overhead. Differential Revision: [D85532076](https://our.internmc.facebook.com/intern/diff/D85532076/) [ghstack-poisoned]
1 parent 3327260 commit 79bcbc5

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

extension/llm/custom_ops/op_sdpa_impl.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -775,10 +775,10 @@ void cpu_flash_attention(
775775
// at::Tensor buf_reduced = at::empty(
776776
// {num_thread, qSplitSize, is_reduced_type ? kvSplitSize : 0},
777777
// query.options());
778-
int64_t size_per_thread_qdq_vec = qSplitSize * kvSplitSize * headSize;
778+
int64_t size_per_thread_qdq_vec = kvSplitSize * headSize;
779779
// Lets align size_per_thread_qdq_vec to 64 bytes, for coalesced cache reads,
780780
// by padding with right number of per thread elements
781-
constexpr int64_t kAlignment = 32;
781+
constexpr int64_t kAlignment = 64;
782782
size_per_thread_qdq_vec =
783783
(size_per_thread_qdq_vec + kAlignment - 1) & (-(kAlignment - 1));
784784
int64_t size_per_thread_qdq_bytes = size_per_thread_qdq_vec * sizeof(accum_t);

0 commit comments

Comments
 (0)