Skip to content

Conversation

@drbh
Copy link
Collaborator

@drbh drbh commented Jan 9, 2026

This is a work in progress PR to add a new benchmark command to the kernels cli tool. The idea is to enable a standard way to benchmark kernels

uvx \
--from git+https://github.com/huggingface/kernels.git@initial-benchmark-command \
--with torch \
--with numpy \
kernels benchmark kernels-community/activation # <- the expected command once merged

output

    Updated https://github.com/huggingface/kernels.git (daa75e4edfaccca487b9de9fb2b85b4cd052fd42)
      Built kernels @ git+https://github.com/huggingface/kernels.git@daa75e4edfaccca487b9de9fb2b85b4cd052fd42
Installed 45 packages in 81ms
Downloading kernels-community/activation@main...
Running benchmark.py...

┌────────────────┬──────────┬────────────┬────────────┬────────────┬────────────┬────────────┬──────────┬───────────┐
│ Benchmark      │ Workload │   Mean(ms) │    Std(ms) │    Min(ms) │    Max(ms) │    IQR(ms) │ Outliers │ Ref Match │
├────────────────┼──────────┼────────────┼────────────┼────────────┼────────────┼────────────┼──────────┼───────────┤
│ SiluWorkloads  │ medium   │     0.0109 │     0.0003 │     0.0105 │     0.0130 │     0.0002 │        3 │     ✓     │
│ SiluWorkloads  │ small    │     0.0051 │     0.0005 │     0.0049 │     0.0095 │     0.0002 │        3 │     ✓     │
│ SiluWorkloads2 │ medium   │     0.0298 │     0.0017 │     0.0282 │     0.0390 │     0.0010 │        5 │     ·     │
│ SiluWorkloads2 │ small    │     0.0061 │     0.0025 │     0.0055 │     0.0305 │     0.0001 │        7 │     ✓     │
└────────────────┴──────────┴────────────┴────────────┴────────────┴────────────┴────────────┴──────────┴───────────┘

Dry run - use --upload to submit results

benchmark file

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@drbh
Copy link
Collaborator Author

drbh commented Jan 13, 2026

updated output

uvx \
--from git+https://github.com/huggingface/kernels.git@initial-benchmark-command \
--with torch \
--with numpy \
kernels benchmark kernels-community/activation # <- the expected command once merged

output

kernels benchmark kernels-community/activation # <- the expected command once merged
    Updated https://github.com/huggingface/kernels.git (a78fc45da9c22ac06411fb282570a8d7123fb412)
      Built kernels @ git+https://github.com/huggingface/kernels.git@a78fc45da9c22ac06411fb282570a8d7123fb412
Installed 45 packages in 73ms
Downloading kernels-community/activation@main...
Running benchmark.py...

┌────────────────┬──────────┬───────┬─────────┬────────────┬────────────┬────────────┬────────────┬────────────┬──────────┬────────────┬───────┐
│ Benchmark      │ Workload │     N │ Speedup │   Mean(ms) │    Std(ms) │    Min(ms) │    Max(ms) │    IQR(ms) │ Outliers │    Ref(ms) │ Match │
├────────────────┼──────────┼───────┼─────────┼────────────┼────────────┼────────────┼────────────┼────────────┼──────────┼────────────┼───────┤
│ SiluWorkloads  │ medium   │   100 │   6.93x │     0.0109 │     0.0002 │     0.0107 │     0.0123 │     0.0002 │        4 │     0.0755 │   ✓   │
│ SiluWorkloads  │ small    │   100 │  10.62x │     0.0053 │     0.0011 │     0.0050 │     0.0159 │     0.0002 │        6 │     0.0563 │   ✓   │
│ SiluWorkloads2 │ medium   │   100 │         │     0.0152 │     0.0018 │     0.0138 │     0.0257 │     0.0004 │       14 │            │   ·   │
│ SiluWorkloads2 │ small    │   100 │  10.60x │     0.0103 │     0.0006 │     0.0095 │     0.0136 │     0.0004 │        5 │     0.1092 │   ✓   │
└────────────────┴──────────┴───────┴─────────┴────────────┴────────────┴────────────┴────────────┴────────────┴──────────┴────────────┴───────┘

  medium: 6.93x faster (95% CI: 0.0109-0.0109ms vs ref 0.0755ms) ✓ significant
  small: 10.62x faster (95% CI: 0.0051-0.0055ms vs ref 0.0563ms) ✓ significant
  small: 10.60x faster (95% CI: 0.0102-0.0104ms vs ref 0.1092ms) ✓ significant

Kernel: 6ff8e1a  Benchmark: 9b68fca
Dry run - use --upload to submit results

@drbh
Copy link
Collaborator Author

drbh commented Jan 15, 2026

latest changes include adding a benchmark class for attention which can be used with

uv run kernels benchmark kernels-community/flash-attn2

file https://huggingface.co/kernels-community/flash-attn2/blob/main/benchmark.py

uv run kernels benchmark kernels-community/flash-attn3

file https://huggingface.co/kernels-community/flash-attn3/blob/main/benchmark.py

uv run kernels benchmark kernels-community/vllm-flash-attn3

file https://huggingface.co/kernels-community/vllm-flash-attn3/blob/main/benchmark.py

and activation also has a benchmark

uv run kernels benchmark kernels-community/activation

file https://huggingface.co/kernels-community/activation/blob/main/benchmark.py

@drbh drbh marked this pull request as ready for review January 15, 2026 20:01
@drbh
Copy link
Collaborator Author

drbh commented Jan 15, 2026

the following benches now run and use pre defined benchmarks that contain logic in this branch and pointer files in the kernel repos

uv run kernels benchmark kernels-community/activation
uv run kernels benchmark kernels-community/flash-attn2
uv run kernels benchmark kernels-community/flash-attn3
uv run kernels benchmark kernels-community/vllm-flash-attn3

https://huggingface.co/kernels-community/activation/blob/main/benchmark.py
https://huggingface.co/kernels-community/flash-attn2/blob/main/benchmark.py
https://huggingface.co/kernels-community/flash-attn3/blob/main/benchmark.py
https://huggingface.co/kernels-community/vllm-flash-attn3/blob/main/benchmark.py

Future related work

This PR is the first step to enable more benchmarking features to kernels. Following PRs will continue to add

  • documentation for writing custom benchmarks
  • documentation for improving benches contained in this library
  • explore pushing benchmark results to a shared location to aggregate community runs
  • improve serialization format and benchmark identifiers for previous bullet point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants