[Dev][jit] Introduce jit for kernel functions #12

LeiWang1999 · 2025-01-18T17:12:26Z

This pull request includes several updates to the README.md, docs/Installation.md, and various other files to improve documentation, update repository URLs, and introduce new functionality. The most important changes include adding a new section to the README, updating repository URLs, and adding new JIT compilation functionality.

Documentation updates:

README.md: Added a "Latest News" section announcing the open-source release of tile-lang.
README.md: Added a new benchmark image for Dequantize Matmul Performance on A100.

Repository URL updates:

README.md, docker/Dockerfile.cu120, docker/README.md, docs/Installation.md, setup.py: Updated repository URLs from https://github.com/microsoft/TileLang to https://github.com/TileLang/tile-lang. [1] [2] [3] [4] [5]

Code enhancements:

tilelang/jit/__init__.py: Added new JIT compilation functionality, allowing TileLang PrimFunc to be JIT-compiled into a runnable kernel adapter using TVM.
testing/python/jit/test_tilelang_jit_gemm.py: Introduced a new test file for JIT compilation, including a GEMM kernel and a test function to validate its performance.

…ities

LeiWang1999 · 2025-01-18T17:14:49Z

1. Overall Flow

def matmul(...):
    # 1) Derive A, B, C shapes from user parameters
    # 2) Use tilelang.language as T to write the kernel function "main"
    
    @tilelang.jit(
        out_idx=-1,  # create the output tensor at runtime
    )
    @T.prim_func
    def main(
        A: T.Buffer(A_shape, in_dtype),
        B: T.Buffer(B_shape, in_dtype),
        C: T.Buffer((M, N), out_dtype),
    ):
        # TileLang DSL: T.Kernel, T.alloc_shared, T.gemm, T.copy, etc.
        ...
    
    return main  # returns the compiled kernel

@T.prim_func
Converts the Python function body containing TileLang DSL (like T.Kernel, T.copy, T.gemm, etc.) into low-level TVM TIR (PrimFunc).
@tilelang.jit
After parsing into TIR, it handles further steps such as lower and build, and also manages how output tensors are created at runtime based on the out_idx setting.
- In this example, out_idx=-1 means the output buffer is automatically allocated during runtime and returned to the user.

When matmul(...) is called, it produces a compiled kernel (the function main), which can then be called to run on the GPU/CPU.

2. Running the GEMM: `run_gemm(...)`

def run_gemm(...):
    # 1) Obtain the compiled kernel
    matmul_kernel = matmul(...)

    # 2) Prepare PyTorch tensors (A, B) with optional transpose
    A = torch.randn(M, K, dtype=...).cuda()
    B = torch.randn(K, N, dtype=...).cuda()
    if trans_A: A = A.T
    if trans_B: B = B.T

    # 3) Execute the compiled kernel and get the output C
    C = matmul_kernel(A, B)

    # 4) Compare with reference result (torch.matmul)
    ref_C = torch.matmul(A.float(), B.float()).to(out_dtype)
    tilelang.testing.torch_assert_close(C, ref_C, atol=1e-2, rtol=1e-2)

matmul_kernel(A, B) actually invokes the compiled function main(A, B), which runs on the GPU/CPU.
Because out_idx=-1, we only supply the input tensors (A, B), and the output is created internally by the TileLang runtime and returned as C.

3. The Role of `@tilelang.jit` Parameters

Within the @tilelang.jit decorator, the key parameter shown is out_idx=-1. Some notes:

out_idx
- Identifies which function argument is the output tensor.
- If out_idx=-1, you do not need to pass an output buffer; the TileLang runtime will allocate one on your behalf and return it.
- Alternatively, if out_idx=2, for instance, you would be required to provide an already-allocated output buffer as the second argument to the kernel.

You can also configure other arguments in @tilelang.jit(...) (e.g. target='cuda', block_size=128, etc.) to control device selection, scheduling strategies, or other runtime details.

* README.md fixed * update test ci

LeiWang1999 added 4 commits January 18, 2025 09:34

instruction update

4714745

replace link with TileLang/tile-lang

e3755cc

[Dev][Adapter] Implement Torch DLPack Kernel Adapter and related util…

e293437

…ities

lint fix

2d98cb3

LeiWang1999 mentioned this pull request Jan 18, 2025

[Enhancement][Torch] Using torch cpp_extension for kernel integration instead of dl_pack #7

Closed

LeiWang1999 added 6 commits January 19, 2025 16:55

Implement JIT Compiler Components

82b7d2d

Documents update

1568857

lint fix

4f977d2

update logo

4860a89

Merge branch 'main' of https://github.com/tile-ai/tilelang into jit

34e0568

install script fix

ede4da7

LeiWang1999 merged commit 444892f into main Jan 20, 2025
3 of 4 checks passed

LeiWang1999 mentioned this pull request Jan 18, 2025

[Dev] Comprehensive Testing Plan for CI Pipeline #6

Closed

4 tasks

LeiWang1999 deleted the jit branch January 20, 2025 16:17

vincentccc pushed a commit to vincentccc/tilelang that referenced this pull request Jul 21, 2025

[Dev] Add Format and Test CI (tile-ai#12)

7ae01b0

* README.md fixed * update test ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dev][jit] Introduce jit for kernel functions #12

[Dev][jit] Introduce jit for kernel functions #12

LeiWang1999 commented Jan 18, 2025

Uh oh!

LeiWang1999 commented Jan 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Dev][jit] Introduce jit for kernel functions #12

[Dev][jit] Introduce jit for kernel functions #12

Conversation

LeiWang1999 commented Jan 18, 2025

Uh oh!

LeiWang1999 commented Jan 18, 2025

1. Overall Flow

2. Running the GEMM: run_gemm(...)

3. The Role of @tilelang.jit Parameters

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2. Running the GEMM: `run_gemm(...)`

3. The Role of `@tilelang.jit` Parameters