Skip to content

Conversation

@kohankhaki
Copy link
Contributor

@kohankhaki kohankhaki commented Jun 6, 2025

[Compute] Trimat CUDA Kernel

Type of Change

  • New Pocket Reference
  • Edit to Existing Pocket Reference
  • Other (please describe):

Fixes #

Book

  • fundamentals
  • nlp
  • cv
  • rl
  • fl
  • responsible_ai
  • compute

Description

Adding trimat CUDA kernel pocket reference.

Checklist

  • I have included appropriate contributor tags ({{#author}} or {{#authors}})
  • I have added the reading time preprocessor tag under the title
  • Content is concise and within 7 minutes reading time
  • I have included relevant references and further reading links
  • I have tested locally using mdbook watch books/<book-name> --open
  • Pre-commit hooks pass without errors
  • I have linked to related issues

@kohankhaki kohankhaki requested review from Copilot and nerdai June 6, 2025 07:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new pocket reference documenting CUDA kernels for a triangular matrix multiplication forward pass (Trimat) used in causal self-attention.

  • Introduces an overview and motivation for computing only the lower triangle of the attention matrix.
  • Details four CUDA kernel implementations (naive, register tiling, vectorized loads, shared memory tiling) with diagrams.
  • Provides input/output shapes, computation goal, configuration, and references.
Comments suppressed due to low confidence (2)

books/compute/src/cuda/kernels/trimat_forward.md:6

  • Missing reading time preprocessor tag below the title. Please add {{#reading_time}} immediately after the header so that the estimated reading time is rendered correctly.
# Kernels for Triangular Matrix Multiplication (Trimat) Forward Pass

books/compute/src/cuda/kernels/trimat_forward.md:92

  • Using raw HTML <center> tags disables markdownlint rules and may reduce consistency. Consider using <div align="center"> or a mdbook-compatible markdown approach for centering images to avoid disabling MD033.
<center>

@nerdai nerdai requested review from Viky397 and emersodb June 16, 2025 16:12
Copy link
Collaborator

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kohankhaki for the contribution! I think there's a lot to like here. In this first pass, I've noticed there's some styling—in particular, math styling—that we need to clean up.

Also, I know its a bit annoying but the line-length rule is set to 79 and for now shouldn't be ignored. This makes .md files easier to read when viewing them as raw files. Happy to show you how I make format this to 79 line length...

Copy link
Collaborator

@emersodb emersodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think the write up is concise and clear, especially considering the complexity of the topic. Most of my suggestions are fairly cosmetic rather than correcting any issues.

As an aside: @kohankhaki, did you make the svg diagrams? I think they are really nice.

@kohankhaki
Copy link
Contributor Author

@nerdai Thank you for reviewing this. I have fixed all your comments including the length issues.
@emersodb Thanks David for the comments. I have fixed these as well.
I created the figures using excalidraw.

@emersodb emersodb self-requested a review June 23, 2025 13:01
Copy link
Collaborator

@emersodb emersodb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. There is a single comment that is unresolved from my previous review, just asking about the indexing of the blocks in the diagrams. It's probably a naive question. If so, just ignore it and resolve 🙂. I was just curious.

@nerdai
Copy link
Collaborator

nerdai commented Jun 23, 2025

@kohankhaki Excalidraw for the win!

Copy link
Collaborator

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kohankhaki: Thanks for making the changes. I think the Pocket Ref reads great!

NOTE: I've swapped the local image versions for hosted ones, which is one of the final steps for preparing the Pocket Ref to be released.

@kohankhaki
Copy link
Contributor Author

Changes look good to me. There is a single comment that is unresolved from my previous review, just asking about the indexing of the blocks in the diagrams. It's probably a naive question. If so, just ignore it and resolve 🙂. I was just curious.

Changes look good to me. There is a single comment that is unresolved from my previous review, just asking about the indexing of the blocks in the diagrams. It's probably a naive question. If so, just ignore it and resolve 🙂. I was just curious.

Thank you David, just replied. :)

@kohankhaki kohankhaki merged commit 716ef03 into main Jun 23, 2025
1 check passed
@kohankhaki kohankhaki deleted the cuda_trimat branch June 23, 2025 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants