Skip to content

Conversation

@vkuzo
Copy link
Contributor

@vkuzo vkuzo commented Dec 5, 2025

Summary:

Creates a standalone eval script for generating accuracy metrics for
quantization README.md, based on the HuggingFace model definition of
LLaMa 3.1 8B

Why new script?

  1. the current prod script in
    https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py
    uses a custom model definition, this was pre-HF integration, it's better to use HF's model definition now
  2. we have HummingBird scripts in
    https://github.com/pytorch/ao/tree/40c4f44677ae11166c3dcfbb9189cfa78789390c/.github/scripts/torchao_model_releases,
    but they seem pretty verbose and hard to use/modify
  3. we have
    https://github.com/pytorch/ao/blob/main/benchmarks/_models/eval_hf_models.py,
    I copy-pasted and modified this for the current PR. The script above
    didn't work as is for various reasons, and also seemed to be hard to
    use/modify, for main README.md it's important to have a very simple
    standalone script.

We should probably do a pass on the naming before landing.

Future work:

  1. add metrics for int4_weight_only_hqq (need to run on A100)
  2. add metrics for 'int4 weight float8 activation' (currently doesn't work with HF accelerate)
  3. add metrics for mxfp8 and nvfp4 (need to run on B200)
  4. make the parsing of logs automated
  5. also add a similar script for performance benchmarks, using vllm
  6. delete https://github.com/pytorch/ao/blob/main/torchao/_models/llama/

Test Plan:

// debug run on small model
with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh facebook/opt-125m

// real run
with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@vkuzo
Copy link
Contributor Author

vkuzo commented Dec 5, 2025

@pytorch-bot
Copy link

pytorch-bot bot commented Dec 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3449

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 002ba19 with merge base 69ce0fd (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo added a commit that referenced this pull request Dec 5, 2025
Summary:

Creates a standalone eval script for generating accuracy metrics for
quantization README.md, based on the HuggingFace model definition of
LLaMa 3.1 8B

Why new script?
1. the current `prod` script in
   https://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py
   uses a custom model definition, this was pre-HF integration, it's better to use HF's model definition now
2. we have HummingBird scripts in
   https://github.com/pytorch/ao/tree/40c4f44677ae11166c3dcfbb9189cfa78789390c/.github/scripts/torchao_model_releases,
   but they seem pretty verbose and hard to use/modify
3. we have
   https://github.com/pytorch/ao/blob/main/benchmarks/_models/eval_hf_models.py,
   I copy-pasted and modified this for the current PR. The script above
   didn't work as is for various reasons, and also seemed to be hard to
   use/modify, for main README.md it's important to have a very simple
   standalone script.

We should probably do a pass on the naming before landing.

Future work:
1. add metrics for `int4_weight_only_hqq` (need to run on A100)
2. add metrics for `mxfp8` and `nvfp4` (need to run on B200)
3. make the parsing of logs automated
4. also add a similar script for performance benchmarks, using vllm
5. delete https://github.com/pytorch/ao/blob/main/torchao/_models/llama/

Test Plan:

```
// debug run on small model
with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh facebook/opt-125m

// real run
with-proxy time ./benchmarks/quantization/eval_accuracy_for_readme.sh
```

Reviewers:

Subscribers:

Tasks:

Tags:
ghstack-source-id: 39c1d72
ghstack-comment-id: 3618394399
Pull-Request: #3449
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants