- [Dec 2., 2025] We release the code and blog/
- Clone this repo:
git clone git@github.com:Gauss-Math/GAUSS-Eval.git- Install the dependencies:
cd GAUSS-Eval
uv sync
source .venv/bin/activate
export GAUSS_EVAL_ROOT=$(pwd)
- Prepare your API keys and put it into
api/{FILE_NAME}.key. Example usage:
export OPENROUTER_API_KEY=$(cat api/openrouter.key)- (Optional) Configure prompt, data, rubrics, etc using UI with specified
{PORT_NAME}, i.e.10100:
cd ui && python server.py 10100
-
Run the inference either using commands in the UI or in
scripts/.- Example:
bash scripts/usamo/gpt5.sh - If your run does not generate responses for some samples (usually due to unstable providers), please run:
python src/main.py --resume_from {YOUR_RESULT_DIR}and make sure a temporary summarysummary.jsonis in it.
- Example:
-
Generate brief summary using:
python src/analyze.py
- (Optional) Generate reports using:
python src/analyze.py --generate-reports --figure
If you find this code useful, please give a star and cite us as:
@article{chu2025gausseval,
author = {Chu, Tianzhe and Zhang, Jiaxin and Liao, Zhenyu and Ren, Qiuyu and Saffat, Tahsin and Yang, Zitong and Ma, Yi and Zhang, Yue},
title = {GAUSS Eval: Human-LLM Judge Consistency Analysis},
year = {2025},
journal = {GAUSS Blogs},
note = {https://gaussmath.ai/eval.html}
}
