Skip to content

Gauss-Math/GAUSS-Eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GAUSS Eval:
Human–LLM Judge Consistency Analysis

3d

Misc: Nano Banana generates this figure with typo and we leave it here.

Website Full Report Run Logs
GAUSS Team

Release

  • [Dec 2., 2025] We release the code and blog/

Usage

  1. Clone this repo:
git clone git@github.com:Gauss-Math/GAUSS-Eval.git
  1. Install the dependencies:
cd GAUSS-Eval
uv sync
source .venv/bin/activate
export GAUSS_EVAL_ROOT=$(pwd)
  1. Prepare your API keys and put it into api/{FILE_NAME}.key. Example usage:
export OPENROUTER_API_KEY=$(cat api/openrouter.key)
  1. (Optional) Configure prompt, data, rubrics, etc using UI with specified {PORT_NAME}, i.e. 10100:
cd ui && python server.py 10100
  1. Run the inference either using commands in the UI or in scripts/.

    • Example: bash scripts/usamo/gpt5.sh
    • If your run does not generate responses for some samples (usually due to unstable providers), please run: python src/main.py --resume_from {YOUR_RESULT_DIR} and make sure a temporary summary summary.json is in it.
  2. Generate brief summary using:

python src/analyze.py
  1. (Optional) Generate reports using:
python src/analyze.py --generate-reports --figure

Citation

If you find this code useful, please give a star and cite us as:

@article{chu2025gausseval,
  author = {Chu, Tianzhe and Zhang, Jiaxin and Liao, Zhenyu and Ren, Qiuyu and Saffat, Tahsin and Yang, Zitong and Ma, Yi and Zhang, Yue},
  title = {GAUSS Eval: Human-LLM Judge Consistency Analysis},
  year = {2025},
  journal = {GAUSS Blogs},
  note = {https://gaussmath.ai/eval.html}
}

About

Official repo for GAUSS-Eval blog.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published