Skip to content

Conversation

@akshathmangudi
Copy link
Contributor

Overview

Resolves #740.

Consists of a single-file implementation scicode.py which uses SciCode's prompt structure found in background_comment_template.txt.

STATUS: ready for review.

HF space link: https://huggingface.co/spaces/akshathmangudi/gpt-4o-scicode

@akshathmangudi
Copy link
Contributor Author

cc: @NathanHB would love your feedback on this.

Copy link
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey ! Looking nice :)
Only thing is that it only check if the code has been generated ?

i think you can also use what is there: https://github.com/scicode-bench/SciCode/blob/main/eval/inspect_ai/scicode.py

And give credit / ask them of course :)

Comment on lines +171 to +186
class SciCodeMetric(SampleLevelComputation):
"""Metric for SciCode code generation evaluation."""

def compute(self, model_response: ModelResponse, doc: Doc, **kwargs) -> dict:
"""Check if code was generated."""
assert doc.specific is not None, "Doc specific field is required for scicode metric"

predictions = model_response.final_text

if not predictions:
return {"code_extracted": 0.0}

generated_code = extract_code(predictions[0])
code_extracted = 1.0 if generated_code and len(generated_code.strip()) > 0 else 0.0

return {"code_extracted": code_extracted}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the function taht are meant to be used with backends other than inspect-ai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[EVAL] SciCode: reasearch coding benchmark

2 participants