@@ -104,18 +104,21 @@ pip install -U flash-attn
104104To generate code samples from a model, you can use the following command:
105105>
106106``` bash
107+ # when greedy, there is no need for temperature and n_samples
107108bigcodebench.generate \
108109 --model [model_name] \
109- --subset [complete| instruct] \
110- --greedy \
110+ --split [complete| instruct] \
111+ --subset [full| hard] \
112+ [--greedy] \
111113 --bs [bs] \
112114 --temperature [temp] \
113115 --n_samples [n_samples] \
114116 --resume \
115117 --backend [vllm| hf| openai| mistral| anthropic| google] \
116118 --tp [gpu_number] \
117119 [--trust_remote_code] \
118- [--base_url [base_url]]
120+ [--base_url [base_url]] \
121+ [--tokenizer_name [tokenizer_name]]
119122` ` `
120123>
121124The generated code samples will be stored in a file named ` [model_name]--bigcodebench-[instruct| complete]--[backend]-[temp]-[n_samples].jsonl` . Alternatively, you can use the following command to utilize our pre-built docker images for generating code samples:
@@ -124,7 +127,8 @@ The generated code samples will be stored in a file named `[model_name]--bigcode
124127# If you are using GPUs
125128docker run --gpus ' "device=$CUDA_VISIBLE_DEVICES"' -v $( pwd) :/app -t bigcodebench/bigcodebench-generate:latest \
126129 --model [model_name] \
127- --subset [complete| instruct] \
130+ --split [complete| instruct] \
131+ --subset [full| hard] \
128132 [--greedy] \
129133 --bs [bs] \
130134 --temperature [temp] \
@@ -136,7 +140,8 @@ docker run --gpus '"device=$CUDA_VISIBLE_DEVICES"' -v $(pwd):/app -t bigcodebenc
136140# ...Or if you are using CPUs
137141docker run -v $( pwd) :/app -t bigcodebench/bigcodebench-generate:latest \
138142 --model [model_name] \
139- --subset [complete| instruct] \
143+ --split [complete| instruct] \
144+ --subset [full| hard] \
140145 [--greedy] \
141146 --bs [bs] \
142147 --temperature [temp] \
@@ -233,10 +238,10 @@ You are strongly recommended to use a sandbox such as [docker](https://docs.dock
233238# If you want to change the RAM address space limit (in MB, 128 GB by default): `--max-as-limit XXX`
234239# If you want to change the RAM data segment limit (in MB, 4 GB by default): `--max-data-limit`
235240# If you want to change the RAM stack limit (in MB, 4 MB by default): `--max-stack-limit`
236- docker run -v $( pwd) :/app bigcodebench/bigcodebench-evaluate:latest --subset [complete| instruct] --samples samples-sanitized-calibrated.jsonl
241+ docker run -v $( pwd) :/app bigcodebench/bigcodebench-evaluate:latest --split [complete| instruct] --subset [full | hard ] --samples samples-sanitized-calibrated.jsonl
237242
238243# If you only want to check the ground truths
239- docker run -v $( pwd) :/app bigcodebench/bigcodebench-evaluate:latest --subset [complete| instruct] --samples samples-sanitized-calibrated.jsonl --check-gt-only
244+ docker run -v $( pwd) :/app bigcodebench/bigcodebench-evaluate:latest --split [complete| instruct] --subset [full | hard ] --samples samples-sanitized-calibrated.jsonl --check-gt-only
240245` ` `
241246
242247...Or if you want to try it locally regardless of the risks ⚠️:
@@ -251,12 +256,12 @@ Then, run the evaluation:
251256
252257` ` ` bash
253258# ...Or locally ⚠️
254- bigcodebench.evaluate --subset [complete| instruct] --samples samples-sanitized-calibrated.jsonl
259+ bigcodebench.evaluate --split [complete| instruct] --subset [full | hard ] --samples samples-sanitized-calibrated.jsonl
255260# ...If you really don't want to check the ground truths
256- bigcodebench.evaluate --subset [complete| instruct] --samples samples-sanitized-calibrated.jsonl --no-gt
261+ bigcodebench.evaluate --split [complete| instruct] --subset [full | hard ] --samples samples-sanitized-calibrated.jsonl --no-gt
257262
258263# You are strongly recommended to use the following command to clean up the environment after evaluation:
259- pids=$( ps -u $( id -u) -o pid,comm | grep ' ^ *[0-9]\\+ bigcodebench' | awk ' {print $1}' ) ; if [ -n \" $pids \" ]; then echo $pids | xargs -r kill ; fi ;
264+ pids=$( ps -u $( id -u) -o pid,comm | grep ' bigcodebench' | awk ' {print $1}' ) ; if [ -n \" $pids \" ]; then echo $pids | xargs -r kill ; fi ;
260265rm -rf /tmp/*
261266` ` `
262267
0 commit comments