[BugFix]: avoid closure-captured mutable shapes in flash_attn and sparse_mla kernel #71

RMLYC · 2025-12-19T07:02:39Z

Prevent AutoTuner crash caused by mutable/closure-captured shape objects by moving shape definitions into the [@tilelang.jit] scope
AutoTune.run() asserts closure cell contents are serializable; mutable lists captured by the kernel factory caused assertion failures and broke autotuning/caching. Moving shapes into the JIT scope and using tuples avoids closure capture and keeps cache keys stable.
related [BUG] Autotune failed in tilelang==0.1.7 #69

…d sparse_mla kernels

gemini-code-assist · 2025-12-19T07:02:55Z

Summary of Changes

Hello @RMLYC, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a critical bug fix to enhance the stability and reliability of the AutoTuner. Previously, mutable shape objects were inadvertently captured by closures within the flash_attn and sparse_mla kernels, leading to assertion failures during autotuning and caching. The solution involves relocating these shape definitions to within the @tilelang.jit decorated functions and converting them from lists to tuples, thereby making them immutable and ensuring they are correctly serialized and do not interfere with the caching mechanism.

Highlights

AutoTuner Stability: Resolved an AutoTuner crash by preventing mutable shape objects from being closure-captured within kernels.
Shape Definition Scope: Moved shape definitions into the @tilelang.jit scope for flash_attn and sparse_mla kernels, ensuring they are properly handled during JIT compilation.
Immutable Shapes: Converted shape definitions from mutable lists to immutable tuples to ensure proper serialization and stable cache keys for autotuning.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to fix an AutoTuner crash by moving mutable shape definitions into the JIT scope and converting them to immutable tuples. The changes across flash_attn/fwd.py and flash_attn/bwd.py correctly implement this fix. However, in top/kernels/deepseek_mla/sparse_mla.py, while the shape definitions have been moved, they are still defined as lists instead of tuples. I've added a comment with a suggestion to change them to tuples to ensure the fix is consistently applied and effective.

top/kernels/deepseek_mla/sparse_mla.py

xiayuqing0622 · 2025-12-19T07:34:14Z

Lines 26–31: The comment is too long and should be reformatted into multiple lines. We should also add lint / pre-commit checks to enforce line-length limits.

Regarding the content, please split this into a concise high-level comment followed by short, focused notes or TODOs. For example:
TODO:

...
...
...

top/kernels/deepseek_mla/sparse_mla.py

RMLYC · 2025-12-19T09:08:51Z

Lines 26–31: The comment is too long and should be reformatted into multiple lines. We should also add lint / pre-commit checks to enforce line-length limits.

Regarding the content, please split this into a concise high-level comment followed by short, focused notes or TODOs. For example: TODO:

...

...

...

Which file? Currently, the code uses pre-commit for formatting checks.

xiayuqing0622 · 2025-12-19T10:41:32Z

Lines 26–31: The comment is too long and should be reformatted into multiple lines. We should also add lint / pre-commit checks to enforce line-length limits.
Regarding the content, please split this into a concise high-level comment followed by short, focused notes or TODOs. For example: TODO:

...

...

...

Which file? Currently, the code uses pre-commit for formatting checks.

sparse_mla.py

RMLYC · 2025-12-20T16:21:25Z

Lines 26–31: The comment is too long and should be reformatted into multiple lines. We should also add lint / pre-commit checks to enforce line-length limits.
Regarding the content, please split this into a concise high-level comment followed by short, focused notes or TODOs. For example: TODO:

...

...

...

Which file? Currently, the code uses pre-commit for formatting checks.

sparse_mla.py

done

RMLYC · 2025-12-22T02:17:59Z

all ci check pass, comment resloved, @xiayuqing0622 @xysmlx

xiayuqing0622 · 2025-12-22T02:43:18Z

top/ops/sparse_mla.py

+                "you should fix the logic involving CP0 (cp_rank == 0) "
+                "to make sure q with pos < KV_Stride - 1 is masked. "
+                "(Or you may ignore this handling if NaN in these q's outputs "
+                "does not affect other results; this was reported by wangding.)")


The assertion message is overly verbose, mixes multiple concerns, and includes informal wording and personal attribution, which makes it hard to read and not suitable for production code.

Assertions should clearly state the invariant being checked, not embed long explanations, speculative handling, or historical context. This logic also encodes a non-obvious contract between q_start_index_s, kv_stride, and CP0 masking that should be documented explicitly rather than hidden inside an assert string.
I suggest:

Make the assertion concise and factual. Keep the assert focused on what must be true, not why;

Move the explanation into a comment (or docstring). Explain why this invariant exists using structured comments, not an assert message.

# When q_start_index_s != 0, CP0 is expected to mask queries with # position < kv_stride - 1. Otherwise, short CP lengths may produce # invalid (NaN) outputs for early queries. if q_start_index_s != 0: assert q_start_index_s > kv_stride, ( "q_start_index_s must be greater than kv_stride when non-zero" )

xiayuqing0622 · 2025-12-22T02:46:26Z

top/kernels/deepseek_mla/sparse_mla.py

-            assert kv_group == 1, 'here we solve the heads padding automatically, other wise you should handle Q copy and Output copy with your mask (when kv_group == 1, use g_i * padded_H:(g_i+1) * padded_H would be handled automatically)'
-        # print(f'padded_H = {padded_H}, heads = {heads}')
+            assert kv_group == 1, ("Automatic heads padding is only supported when kv_group == 1. "
+                                   "Otherwise handle Q and Output copies with an appropriate mask.")


The assertion message mixes a policy decision (“only supported”) with implementation guidance (“handle Q and Output copies with a mask”). This makes the intent unclear and puts design instructions inside an assert. The invariant should be stated succinctly, while the rationale and future work should be documented separately.

suggested rewrite:

# Automatic head padding is only valid when kv_group == 1. # For kv_group > 1, Q and output tensors must be masked explicitly # to handle padded heads correctly. # TODO: Support automatic head padding for kv_group > 1 # by applying appropriate masks during Q and output copies. if padded_H != heads: assert kv_group == 1, ( "Automatic head padding requires kv_group == 1" )

xiayuqing0622 · 2025-12-22T02:49:37Z

top/kernels/deepseek_mla/sparse_mla.py

+    The behavior may change if CP0 logic is fixed (there is a compiler-related
+    bug affecting that logic).
+    """
    assert dim == tilelang.math.next_power_of_2(


The original docstring mixes background, assumptions, speculative fixes, and historical context in a single paragraph, which makes it hard to scan.
suggested rewrite:

""" Implements sparse attention. Note: - The first (kv_stride - 1) token outputs may be NaN due to current CP0 handling. - These outputs are typically unused and assumed to be safe to ignore. - During the backward pass, care must be taken to prevent NaN propagation (e.g., in expressions like `delta = out * dout`). - A common mitigation is to zero out invalid outputs before backward computation. This behavior is a known limitation and may change once the CP0 logic is corrected (currently affected by a compiler-related issue). """

fix(autotune): avoid closure-captured mutable shapes in flash_attn an…

515c6b7

…d sparse_mla kernels

gemini-code-assist bot reviewed Dec 19, 2025

View reviewed changes

top/kernels/deepseek_mla/sparse_mla.py Outdated Show resolved Hide resolved

fix bug in sparse mla kernel

d1011f3

RMLYC requested review from xiayuqing0622 and xysmlx December 19, 2025 07:10

xiayuqing0622 reviewed Dec 19, 2025

View reviewed changes

top/kernels/deepseek_mla/sparse_mla.py Outdated Show resolved Hide resolved

RMLYC added 6 commits December 20, 2025 15:58

test

644f898

fix comment

0b72a66

fix comment

87cd61f

fix comment

89ecd5d

revert 0b72a66

4d601c1

fix bug ori_heads to head_q

46e1537

xiayuqing0622 reviewed Dec 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix]: avoid closure-captured mutable shapes in flash_attn and sparse_mla kernel #71

[BugFix]: avoid closure-captured mutable shapes in flash_attn and sparse_mla kernel #71

Uh oh!

RMLYC commented Dec 19, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

xiayuqing0622 commented Dec 19, 2025

Uh oh!

Uh oh!

RMLYC commented Dec 19, 2025

Uh oh!

xiayuqing0622 commented Dec 19, 2025

Uh oh!

RMLYC commented Dec 20, 2025

Uh oh!

RMLYC commented Dec 22, 2025

Uh oh!

xiayuqing0622 Dec 22, 2025

Uh oh!

xiayuqing0622 Dec 22, 2025

Uh oh!

xiayuqing0622 Dec 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[BugFix]: avoid closure-captured mutable shapes in flash_attn and sparse_mla kernel #71

Are you sure you want to change the base?

[BugFix]: avoid closure-captured mutable shapes in flash_attn and sparse_mla kernel #71

Uh oh!

Conversation

RMLYC commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Dec 19, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

xiayuqing0622 commented Dec 19, 2025

Uh oh!

Uh oh!

RMLYC commented Dec 19, 2025

Uh oh!

xiayuqing0622 commented Dec 19, 2025

Uh oh!

RMLYC commented Dec 20, 2025

Uh oh!

RMLYC commented Dec 22, 2025

Uh oh!

xiayuqing0622 Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

xiayuqing0622 Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

xiayuqing0622 Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RMLYC commented Dec 19, 2025 •

edited

Loading