Skip to content

Commit ffdcb72

Browse files
authored
FlashAttention Benchmark update (#96)
FA4 now automatically picks up nvidia-cutlass-dsl from the project requirements. This fixes the failures from last few days where we were installing outdated package. Test output now clearly states system power limit Update Docker image version in workflow
1 parent 4f5ace1 commit ffdcb72

File tree

1 file changed

+8
-10
lines changed

1 file changed

+8
-10
lines changed

.github/workflows/flash_attention.yml

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
3535
- name: Run Flash Attention benchmark in Docker
3636
env:
37-
DOCKER_IMAGE: nvcr.io/nvidia/pytorch:25.06-py3
37+
DOCKER_IMAGE: nvcr.io/nvidia/pytorch:25.09-py3
3838
run: |
3939
set -eux
4040
@@ -52,21 +52,19 @@ jobs:
5252
"${DOCKER_IMAGE}"
5353
)
5454
55-
# Install CuTe DSL
56-
docker exec -t "${container_name}" bash -c "
57-
set -x
58-
echo 'Installing nvidia-cutlass-dsl'
59-
pip install nvidia-cutlass-dsl==4.1.0
60-
"
61-
6255
# Build and run FlashAttention CuTe DSL
6356
docker exec -t "${container_name}" bash -c "
6457
set -x
6558
pushd fa4
6659
python setup.py install
67-
68-
echo '<h1>B200 1000W</h1>' >> /tmp/workspace/fa4_output.txt
60+
pip install -e flash_attn/cute/
61+
6962
nvidia-smi
63+
64+
echo '<h1>B200' >> /tmp/workspace/fa4_output.txt
65+
nvidia-smi -q -d POWER | grep 'Current Power Limit' | head -1 | cut -d : -f 2 >> /tmp/workspace/fa4_output.txt
66+
echo '</h1>' >> /tmp/workspace/fa4_output.txt
67+
7068
export PYTHONPATH=\$(pwd)
7169
python benchmarks/benchmark_attn.py >> /tmp/workspace/fa4_output.txt
7270
popd

0 commit comments

Comments
 (0)