added trimat_forward cuda kernel. #156

kohankhaki · 2025-06-06T07:27:14Z

[Compute] Trimat CUDA Kernel

Type of Change

New Pocket Reference
Edit to Existing Pocket Reference
Other (please describe):

Fixes #

Book

Description

Adding trimat CUDA kernel pocket reference.

Checklist

I have included appropriate contributor tags ({{#author}} or {{#authors}})
I have added the reading time preprocessor tag under the title
Content is concise and within 7 minutes reading time
I have included relevant references and further reading links
I have tested locally using mdbook watch books/<book-name> --open
Pre-commit hooks pass without errors
I have linked to related issues

Copilot

Pull Request Overview

This PR adds a new pocket reference documenting CUDA kernels for a triangular matrix multiplication forward pass (Trimat) used in causal self-attention.

Introduces an overview and motivation for computing only the lower triangle of the attention matrix.
Details four CUDA kernel implementations (naive, register tiling, vectorized loads, shared memory tiling) with diagrams.
Provides input/output shapes, computation goal, configuration, and references.

Comments suppressed due to low confidence (2)

books/compute/src/cuda/kernels/trimat_forward.md:6

Missing reading time preprocessor tag below the title. Please add {{#reading_time}} immediately after the header so that the estimated reading time is rendered correctly.

# Kernels for Triangular Matrix Multiplication (Trimat) Forward Pass

books/compute/src/cuda/kernels/trimat_forward.md:92

Using raw HTML <center> tags disables markdownlint rules and may reduce consistency. Consider using <div align="center"> or a mdbook-compatible markdown approach for centering images to avoid disabling MD033.

<center>

nerdai

Thanks @kohankhaki for the contribution! I think there's a lot to like here. In this first pass, I've noticed there's some styling—in particular, math styling—that we need to clean up.

Also, I know its a bit annoying but the line-length rule is set to 79 and for now shouldn't be ignored. This makes .md files easier to read when viewing them as raw files. Happy to show you how I make format this to 79 line length...

books/compute/src/cuda/kernels/trimat_forward.md

emersodb

Overall, I think the write up is concise and clear, especially considering the complexity of the topic. Most of my suggestions are fairly cosmetic rather than correcting any issues.

As an aside: @kohankhaki, did you make the svg diagrams? I think they are really nice.

books/compute/src/cuda/kernels/trimat_forward.md

kohankhaki · 2025-06-23T03:14:29Z

@nerdai Thank you for reviewing this. I have fixed all your comments including the length issues.
@emersodb Thanks David for the comments. I have fixed these as well.
I created the figures using excalidraw.

emersodb

Changes look good to me. There is a single comment that is unresolved from my previous review, just asking about the indexing of the blocks in the diagrams. It's probably a naive question. If so, just ignore it and resolve 🙂. I was just curious.

nerdai · 2025-06-23T15:59:49Z

@kohankhaki Excalidraw for the win!

nerdai

@kohankhaki: Thanks for making the changes. I think the Pocket Ref reads great!

NOTE: I've swapped the local image versions for hosted ones, which is one of the final steps for preparing the Pocket Ref to be released.

kohankhaki · 2025-06-23T19:14:42Z

Changes look good to me. There is a single comment that is unresolved from my previous review, just asking about the indexing of the blocks in the diagrams. It's probably a naive question. If so, just ignore it and resolve 🙂. I was just curious.

Thank you David, just replied. :)

added trimat_forward cuda kernel.

697e7ad

kohankhaki requested review from Copilot and nerdai June 6, 2025 07:28

Copilot AI reviewed Jun 6, 2025

View reviewed changes

nerdai requested review from Viky397 and emersodb June 16, 2025 16:12

nerdai reviewed Jun 18, 2025

View reviewed changes

emersodb reviewed Jun 19, 2025

View reviewed changes

kohankhaki added 3 commits June 22, 2025 20:18

Fixed issues.

ce4567e

enabled MD013

88534e1

fixed trimat spelling.

9d6891b

emersodb self-requested a review June 23, 2025 13:01

emersodb approved these changes Jun 23, 2025

View reviewed changes

hosted images

6310ff5

nerdai approved these changes Jun 23, 2025

View reviewed changes

kohankhaki merged commit 716ef03 into main Jun 23, 2025
1 check passed

kohankhaki deleted the cuda_trimat branch June 23, 2025 19:25

added trimat_forward cuda kernel. #156

added trimat_forward cuda kernel. #156

Uh oh!

Conversation

kohankhaki commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[Compute] Trimat CUDA Kernel

Type of Change

Book

Description

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

nerdai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

emersodb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kohankhaki commented Jun 23, 2025

Uh oh!

emersodb left a comment

Choose a reason for hiding this comment

Uh oh!

nerdai commented Jun 23, 2025

Uh oh!

nerdai left a comment

Choose a reason for hiding this comment

Uh oh!

kohankhaki commented Jun 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kohankhaki commented Jun 6, 2025 •

edited

Loading