-
Notifications
You must be signed in to change notification settings - Fork 8
added trimat_forward cuda kernel. #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new pocket reference documenting CUDA kernels for a triangular matrix multiplication forward pass (Trimat) used in causal self-attention.
- Introduces an overview and motivation for computing only the lower triangle of the attention matrix.
- Details four CUDA kernel implementations (naive, register tiling, vectorized loads, shared memory tiling) with diagrams.
- Provides input/output shapes, computation goal, configuration, and references.
Comments suppressed due to low confidence (2)
books/compute/src/cuda/kernels/trimat_forward.md:6
- Missing reading time preprocessor tag below the title. Please add
{{#reading_time}}immediately after the header so that the estimated reading time is rendered correctly.
# Kernels for Triangular Matrix Multiplication (Trimat) Forward Pass
books/compute/src/cuda/kernels/trimat_forward.md:92
- Using raw HTML
<center>tags disables markdownlint rules and may reduce consistency. Consider using<div align="center">or a mdbook-compatible markdown approach for centering images to avoid disabling MD033.
<center>
nerdai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kohankhaki for the contribution! I think there's a lot to like here. In this first pass, I've noticed there's some styling—in particular, math styling—that we need to clean up.
Also, I know its a bit annoying but the line-length rule is set to 79 and for now shouldn't be ignored. This makes .md files easier to read when viewing them as raw files. Happy to show you how I make format this to 79 line length...
emersodb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think the write up is concise and clear, especially considering the complexity of the topic. Most of my suggestions are fairly cosmetic rather than correcting any issues.
As an aside: @kohankhaki, did you make the svg diagrams? I think they are really nice.
emersodb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me. There is a single comment that is unresolved from my previous review, just asking about the indexing of the blocks in the diagrams. It's probably a naive question. If so, just ignore it and resolve 🙂. I was just curious.
|
@kohankhaki Excalidraw for the win! |
nerdai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kohankhaki: Thanks for making the changes. I think the Pocket Ref reads great!
NOTE: I've swapped the local image versions for hosted ones, which is one of the final steps for preparing the Pocket Ref to be released.
Thank you David, just replied. :) |
[Compute] Trimat CUDA Kernel
Type of Change
Fixes #
Book
Description
Adding trimat CUDA kernel pocket reference.
Checklist
{{#author}}or{{#authors}})mdbook watch books/<book-name> --open