Quick start • Repository • Cite • Contact
A framework for multiple-choice evaluation that mitigates selection bias by counterbalancing position and semantic preferences in language models.
- Paper: SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models (https://arxiv.org/abs/2507.18182)
- Core idea: use Inverse-Positioning (IP) to offset models’ positional biases and Semantic-Spread (SS) to spatially separate similar distractors, reducing guesswork.
- Position bias: models disproportionately select certain answer slots (e.g., first/last); IP offsets this by placing the true answer in a less-preferred position.
- Semantic bias: models tend to choose semantically similar distractors when uncertain; SS identifies near-miss distractors and spreads them apart to prevent clustering.
- General: jointly applying IP + SS yields a fairer multiple-choice benchmark for large language models.
All scripts are designed for ease of reproducibility; you should be able to run the benchmarks within a few minutes.
# 1) clone
git clone https://github.com/WonjunJeong97/SCOPE.git
cd SCOPE
# 2) Python deps (3.10+)
python -m venv .venv && source .venv/bin/activate # or: conda create -n scope python=3.10 -y && conda activate scope
pip install -r requirements.txtcp .env.example .env
# Edit .env with any required API keys/tokens (e.g., OpenAI, HuggingFace) if your model requires them.python -m pip install jupyter
jupyter lab
# Open notebooks under notebooks/ and run the first cells to verify your setup.Run the built-in test mode to verify your installation end to end:
bash scripts/run_evaluation.sh -t
# Optionally pin dataset/model (same test mode, just more explicit):
bash scripts/run_evaluation.sh -t -d csqa -m gpt-3.5-turboIf it completes without errors, you’re ready to reproduce the paper.
Note: This assumes
.envis set up and the fixed datasets exist at the paths inconfigs/default.yaml.
SCOPE/
├─ configs/ # per-table/figure experiment configs (YAML)
├─ figures/ # static images for README/docs (pipeline, schematics)
├─ scripts/ # download / train / eval / run_all helpers
├─ src/ # core implementation (data, models, utils, train.py, etc.)
├─ notebooks/ # demo & reproduction notebooks
├─ requirements.txt
├─ .env.example # environment variable template
└─ README.md
If this repository or the SCOPE framework helps your research, please cite:
@article{jeong2025scope,
title = {SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models},
author = {Jeong, Wonjun and Kim, Dongseok and Whangbo, Taegkeun},
journal = {arXiv preprint arXiv:2507.18182},
year = {2025}
}
You may also cite the code base itself (optional):
@misc{scope_code_2025,
title = {SCOPE Codebase},
author = {Jeong, Wonjun and Kim, Dongseok and Whangbo, Taegkeun},
howpublished = {\url{https://github.com/WonjunJeong97/SCOPE}},
year = {2025}
}
- Maintainer: Wonjun Jeong (tp04045@gachon.ac.kr)
- Questions & issues: please open a GitHub Issue in this repository.
This project is released under the terms of the license in LICENSE.
