-
Notifications
You must be signed in to change notification settings - Fork 89
Use torch._dynamo.mark_static() API to allow tensor shape specialization outside of the kernel code
#1210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kernel.specialize_args() API to specify specialization on tensor dims outside of the kernelkernel.specialize_args() API to specify tensor dims specialization outside of the kernel
94ddf8b to
ff57470
Compare
kernel.specialize_args() API to specify tensor dims specialization outside of the kernelkernel.specialize_args() API to allow tensor shape specialization outside of the kernel code
jansel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the semantics of this. Does it mutate the behavior of the kernel? What if I call the kernel before calling this API? Can you provide some examples.
72adccf to
dc19935
Compare
@jansel it doesn't mutate the behavior of the original kernel - also updated the docs at https://github.com/pytorch/helion/pull/1210/files#diff-9fdadeb7f22ce14b3a9fa419dea27615a1e66a4faf75bc76942d3474eabc86c1R371-R413 and added |
d1eb101 to
4320adc
Compare
jansel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the same mark dynamic API as torch.compile for this?
bc2a3c6 to
345a4bf
Compare
kernel.specialize_args() API to allow tensor shape specialization outside of the kernel codetorch._dynamo.mark_static() API to allow tensor shape specialization outside of the kernel code
345a4bf to
7585947
Compare
|
Should mention this in the docs |
2f8ff7a to
d91f8de
Compare
In general, baking in specific tensor dim sizes into the kernel is good for perf, but we can't do that for all scenarios because we need a catch-all dynamic kernel for rare sizes (otherwise we would have to autotune for those novel long-tail sizes which is generally prohibitive due to extra autotuning time).
Right now to bake in a specific shape, one must do
hl.specialize(), which makes the kernel not suitable for the all-dynamic case (unless we copy-paste the kernel and remove specifichl.specialize()calls).By reusing
torch._dynamo.mark_static()API, user can achieve the above by having a Helion kernel impl withouthl.specialize()calls, and then add specialization outside of the kernel code, making their codebase much cleaner.Closes #1046.
cc. @Chillee