Use `torch._dynamo.mark_static()` API to allow tensor shape specialization outside of the kernel code #1210

yf225 · 2025-12-05T03:01:46Z

In general, baking in specific tensor dim sizes into the kernel is good for perf, but we can't do that for all scenarios because we need a catch-all dynamic kernel for rare sizes (otherwise we would have to autotune for those novel long-tail sizes which is generally prohibitive due to extra autotuning time).

Right now to bake in a specific shape, one must do hl.specialize(), which makes the kernel not suitable for the all-dynamic case (unless we copy-paste the kernel and remove specific hl.specialize() calls).

By reusing torch._dynamo.mark_static() API, user can achieve the above by having a Helion kernel impl without hl.specialize() calls, and then add specialization outside of the kernel code, making their codebase much cleaner.

Closes #1046.

cc. @Chillee

test/test_specialize.expected

jansel

I'm a bit confused by the semantics of this. Does it mutate the behavior of the kernel? What if I call the kernel before calling this API? Can you provide some examples.

yf225 · 2025-12-06T21:04:12Z

I'm a bit confused by the semantics of this. Does it mutate the behavior of the kernel? What if I call the kernel before calling this API? Can you provide some examples.

@jansel it doesn't mutate the behavior of the original kernel - kernel.specialize_args() will create a new kernel that only shares the config / settings / .fn / ._key_fn with the original kernel, and all other mutable states like _bound_kernels and _specialize_extra are separate

also updated the docs at https://github.com/pytorch/helion/pull/1210/files#diff-9fdadeb7f22ce14b3a9fa419dea27615a1e66a4faf75bc76942d3474eabc86c1R371-R413 and added test_specialize_args_does_not_mutate_original to add examples

jansel

Could we use the same mark dynamic API as torch.compile for this?

jansel · 2025-12-12T05:02:41Z

Should mention this in the docs

yf225 requested review from jansel and oulgen December 5, 2025 03:01

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 5, 2025

yf225 changed the title ~~Add kernel.specialize_args() API to specify specialization on tensor dims outside of the kernel~~ Add kernel.specialize_args() API to specify tensor dims specialization outside of the kernel Dec 5, 2025

yf225 requested review from gmagogsfm and v0i0 December 5, 2025 03:02

yf225 force-pushed the specialize_args branch 4 times, most recently from 94ddf8b to ff57470 Compare December 5, 2025 04:15

v0i0 reviewed Dec 5, 2025

View reviewed changes

test/test_specialize.expected Outdated Show resolved Hide resolved

v0i0 approved these changes Dec 5, 2025

View reviewed changes

yf225 mentioned this pull request Dec 5, 2025

Allow using hl.specialize to specialize on tensor strides #1215

Merged

yf225 changed the title ~~Add kernel.specialize_args() API to specify tensor dims specialization outside of the kernel~~ Add kernel.specialize_args() API to allow tensor shape specialization outside of the kernel code Dec 5, 2025

jansel requested changes Dec 5, 2025

View reviewed changes

yf225 force-pushed the specialize_args branch 13 times, most recently from 72adccf to dc19935 Compare December 6, 2025 21:01

yf225 requested a review from jansel December 6, 2025 21:06

yf225 force-pushed the specialize_args branch 2 times, most recently from d1eb101 to 4320adc Compare December 8, 2025 20:06

jansel requested changes Dec 9, 2025

View reviewed changes

yf225 force-pushed the specialize_args branch 7 times, most recently from bc2a3c6 to 345a4bf Compare December 9, 2025 06:40

yf225 changed the title ~~Add kernel.specialize_args() API to allow tensor shape specialization outside of the kernel code~~ Use torch._dynamo.mark_static() API to allow tensor shape specialization outside of the kernel code Dec 9, 2025

yf225 requested a review from jansel December 9, 2025 06:44

yf225 force-pushed the specialize_args branch from 345a4bf to 7585947 Compare December 9, 2025 06:49

jansel approved these changes Dec 12, 2025

View reviewed changes

yf225 added 3 commits December 12, 2025 10:36

test

5e26f71

up

a8369d5

update docs

d91f8de

yf225 force-pushed the specialize_args branch from 2f8ff7a to d91f8de Compare December 12, 2025 18:37

yf225 added 2 commits December 12, 2025 12:00

fix after rebase

c1abf78

simplify

1d1e9cc

yf225 mentioned this pull request Dec 12, 2025

Add hl.mark_dynamic api #1046

Closed

yf225 merged commit 531cbdc into main Dec 12, 2025
16 checks passed

yf225 deleted the specialize_args branch December 12, 2025 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use `torch._dynamo.mark_static()` API to allow tensor shape specialization outside of the kernel code #1210

Use `torch._dynamo.mark_static()` API to allow tensor shape specialization outside of the kernel code #1210

Uh oh!

yf225 commented Dec 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

jansel left a comment

Uh oh!

yf225 commented Dec 6, 2025

Uh oh!

jansel left a comment

Uh oh!

jansel commented Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Use torch._dynamo.mark_static() API to allow tensor shape specialization outside of the kernel code #1210

Use torch._dynamo.mark_static() API to allow tensor shape specialization outside of the kernel code #1210

Uh oh!

Conversation

yf225 commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

yf225 commented Dec 6, 2025

Uh oh!

jansel left a comment

Choose a reason for hiding this comment

Uh oh!

jansel commented Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Use `torch._dynamo.mark_static()` API to allow tensor shape specialization outside of the kernel code #1210

Use `torch._dynamo.mark_static()` API to allow tensor shape specialization outside of the kernel code #1210

yf225 commented Dec 5, 2025 •

edited

Loading