Skip to content

Conversation

@yf225
Copy link
Contributor

@yf225 yf225 commented Dec 5, 2025

In general, baking in specific tensor dim sizes into the kernel is good for perf, but we can't do that for all scenarios because we need a catch-all dynamic kernel for rare sizes (otherwise we would have to autotune for those novel long-tail sizes which is generally prohibitive due to extra autotuning time).

Right now to bake in a specific shape, one must do hl.specialize(), which makes the kernel not suitable for the all-dynamic case (unless we copy-paste the kernel and remove specific hl.specialize() calls).

By reusing torch._dynamo.mark_static() API, user can achieve the above by having a Helion kernel impl without hl.specialize() calls, and then add specialization outside of the kernel code, making their codebase much cleaner.

Closes #1046.

cc. @Chillee

@yf225 yf225 requested review from jansel and oulgen December 5, 2025 03:01
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 5, 2025
@yf225 yf225 changed the title Add kernel.specialize_args() API to specify specialization on tensor dims outside of the kernel Add kernel.specialize_args() API to specify tensor dims specialization outside of the kernel Dec 5, 2025
@yf225 yf225 requested review from gmagogsfm and v0i0 December 5, 2025 03:02
@yf225 yf225 force-pushed the specialize_args branch 4 times, most recently from 94ddf8b to ff57470 Compare December 5, 2025 04:15
@yf225 yf225 changed the title Add kernel.specialize_args() API to specify tensor dims specialization outside of the kernel Add kernel.specialize_args() API to allow tensor shape specialization outside of the kernel code Dec 5, 2025
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the semantics of this. Does it mutate the behavior of the kernel? What if I call the kernel before calling this API? Can you provide some examples.

@yf225 yf225 force-pushed the specialize_args branch 13 times, most recently from 72adccf to dc19935 Compare December 6, 2025 21:01
@yf225
Copy link
Contributor Author

yf225 commented Dec 6, 2025

I'm a bit confused by the semantics of this. Does it mutate the behavior of the kernel? What if I call the kernel before calling this API? Can you provide some examples.

@jansel it doesn't mutate the behavior of the original kernel - kernel.specialize_args() will create a new kernel that only shares the config / settings / .fn / ._key_fn with the original kernel, and all other mutable states like _bound_kernels and _specialize_extra are separate

also updated the docs at https://github.com/pytorch/helion/pull/1210/files#diff-9fdadeb7f22ce14b3a9fa419dea27615a1e66a4faf75bc76942d3474eabc86c1R371-R413 and added test_specialize_args_does_not_mutate_original to add examples

@yf225 yf225 requested a review from jansel December 6, 2025 21:06
@yf225 yf225 force-pushed the specialize_args branch 2 times, most recently from d1eb101 to 4320adc Compare December 8, 2025 20:06
Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the same mark dynamic API as torch.compile for this?

@yf225 yf225 force-pushed the specialize_args branch 7 times, most recently from bc2a3c6 to 345a4bf Compare December 9, 2025 06:40
@yf225 yf225 changed the title Add kernel.specialize_args() API to allow tensor shape specialization outside of the kernel code Use torch._dynamo.mark_static() API to allow tensor shape specialization outside of the kernel code Dec 9, 2025
@yf225 yf225 requested a review from jansel December 9, 2025 06:44
@jansel
Copy link
Contributor

jansel commented Dec 12, 2025

Should mention this in the docs

@yf225 yf225 mentioned this pull request Dec 12, 2025
@yf225 yf225 merged commit 531cbdc into main Dec 12, 2025
16 checks passed
@yf225 yf225 deleted the specialize_args branch December 12, 2025 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add hl.mark_dynamic api

4 participants