You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ To facilitate the evaluation of LLMs on BigCodeBench, we provide this Python pac
36
36
37
37
BigCodeBench focuses on the evaluation of LLM4Code with *diverse function calls* and *complex instruction*, with:
38
38
39
-
* ✨ **Precise evaluation & ranking**: See [our leaderboard](https://bigcodebench.github.io/leaderboard.html) for latest LLM rankings before & after rigorous evaluation.
39
+
* ✨ **Precise evaluation & ranking**: See [our leaderboard](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard) for latest LLM rankings before & after rigorous evaluation.
40
40
* ✨ **Pre-generated samples**: BigCodeBench accelerates code intelligence research by open-sourcing [LLM-generated samples](#-LLM-generated-code) for various models -- no need to re-run the expensive benchmarks!
41
41
42
42
### Main Differences from EvalPlus
@@ -328,7 +328,7 @@ python get_results.py
328
328
329
329
## 💻 LLM-generated Code
330
330
331
-
We share pre-generated code samples from LLMs we have [evaluated](https://bigcodebench.github.io/leaderboard.html):
331
+
We share pre-generated code samples from LLMs we have [evaluated](https://huggingface.co/spaces/bigcode/bigcodebench-leaderboard):
332
332
* See the attachment of our [v0.1.2](https://github.com/bigcode-project/bigcodebench/releases/tag/v0.1.5). We include both `sanitized_samples.zip` and `sanitized_samples_calibrated.zip` for your convenience.
0 commit comments