GitHub - Gauss-Math/GAUSS-Eval: Official repo for GAUSS-Eval blog.

GAUSS Eval:
Human–LLM Judge Consistency Analysis

$3d$

Misc: Nano Banana generates this figure with typo and we leave it here.

GAUSS Team

Release

[Dec 2., 2025] We release the code and blog/

Usage

Clone this repo:

git clone git@github.com:Gauss-Math/GAUSS-Eval.git

Install the dependencies:

cd GAUSS-Eval
uv sync
source .venv/bin/activate
export GAUSS_EVAL_ROOT=$(pwd)

Prepare your API keys and put it into api/{FILE_NAME}.key. Example usage:

export OPENROUTER_API_KEY=$(cat api/openrouter.key)

(Optional) Configure prompt, data, rubrics, etc using UI with specified {PORT_NAME}, i.e. 10100:

cd ui && python server.py 10100

Run the inference either using commands in the UI or in scripts/.
- Example: bash scripts/usamo/gpt5.sh
- If your run does not generate responses for some samples (usually due to unstable providers), please run: python src/main.py --resume_from {YOUR_RESULT_DIR} and make sure a temporary summary summary.json is in it.
Generate brief summary using:

python src/analyze.py

(Optional) Generate reports using:

python src/analyze.py --generate-reports --figure

Citation

If you find this code useful, please give a star and cite us as:

@article{chu2025gausseval,
  author = {Chu, Tianzhe and Zhang, Jiaxin and Liao, Zhenyu and Ren, Qiuyu and Saffat, Tahsin and Yang, Zitong and Ma, Yi and Zhang, Yue},
  title = {GAUSS Eval: Human-LLM Judge Consistency Analysis},
  year = {2025},
  journal = {GAUSS Blogs},
  note = {https://gaussmath.ai/eval.html}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
api		api
assets		assets
data		data
scripts		scripts
src		src
ui		ui
.gitignore		.gitignore
README.md		README.md
global_config.json		global_config.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GAUSS Eval:
Human–LLM Judge Consistency Analysis

Release

Usage

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Gauss-Math/GAUSS-Eval

Folders and files

Latest commit

History

Repository files navigation

GAUSS Eval: Human–LLM Judge Consistency Analysis

Release

Usage

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

GAUSS Eval:
Human–LLM Judge Consistency Analysis

Packages