You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use the greedy decoding as an example to show how to evaluate the generated code samples via remote API.
93
93
94
+
> [!Note]
95
+
>
96
+
> Remotely executing on `BigCodeBench-Full` typically takes 6-7 minutes, and on `BigCodeBench-Hard` typically takes 4-5 minutes.
97
+
94
98
```bash
95
-
# greedy decoding by default
96
99
bigcodebench.evaluate \
97
100
--model meta-llama/Meta-Llama-3.1-8B-Instruct \
98
101
--split [complete|instruct] \
@@ -105,10 +108,6 @@ bigcodebench.evaluate \
105
108
- The evaluation results will be stored in a file named `[model_name]--bigcodebench-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated_eval_results.json`.
106
109
- The pass@k results will be stored in a file named `[model_name]--bigcodebench-[instruct|complete]--[backend]-[temp]-[n_samples]-sanitized_calibrated_pass_at_k.json`.
107
110
108
-
> [!Note]
109
-
>
110
-
> Remotely executing on `BigCodeBench-Full` typically takes 5-7 minutes, and on `BigCodeBench-Hard` typically takes 3-5 minutes.
111
-
112
111
> [!Note]
113
112
>
114
113
> BigCodeBench uses different prompts for base and chat models.
0 commit comments