Skip to content

[BUG] Unspecified launch failure when running persistent kernel #1501

@chengyupku

Description

@chengyupku

Required prerequisites

What version of TileLang are you using?

0.1.7+cuda.git5acaab76

System information

Python: 3.12.12
TileLang: 0.1.7+cuda.git5acaab76
torch: 2.9.0+cu128

Problem description

I run into a torch.AcceleratorError: CUDA error: unspecified launch failure when running the example_mla_decode_persistent.py script.

After investigating, I found that the root cause is that the default value of execution_backend is tvm_ffi, which enables enable_host_codegen. However, the host codegen path does not include logic for launching cooperative kernels (i.e., it does not call cudaLaunchCooperativeKernel). As a result, kernels that rely on cooperative launch fail at runtime, leading to the unspecified launch failure.

Reproducible example code

The Python snippets:

python examples/deepseek_mla/example_mla_decode_persistent.py

Traceback

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions