create a new accuracy eval script for official README.md eval accuracy #3449
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Creates a standalone eval script for generating accuracy metrics for
quantization README.md, based on the HuggingFace model definition of
LLaMa 3.1 8B
Why new script?
prodscript inhttps://github.com/pytorch/ao/blob/main/torchao/_models/llama/eval.py
uses a custom model definition, this was pre-HF integration, it's better to use HF's model definition now
https://github.com/pytorch/ao/tree/40c4f44677ae11166c3dcfbb9189cfa78789390c/.github/scripts/torchao_model_releases,
but they seem pretty verbose and hard to use/modify
https://github.com/pytorch/ao/blob/main/benchmarks/_models/eval_hf_models.py,
I copy-pasted and modified this for the current PR. The script above
didn't work as is for various reasons, and also seemed to be hard to
use/modify, for main README.md it's important to have a very simple
standalone script.
We should probably do a pass on the naming before landing.
Future work:
int4_weight_only_hqq(need to run on A100)mxfp8andnvfp4(need to run on B200)Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: