[EVAL] SciCode #1086

akshathmangudi · 2025-11-27T08:02:33Z

Overview

Resolves #740.

Consists of a single-file implementation scicode.py which uses SciCode's prompt structure found in background_comment_template.txt.

STATUS: ready for review.

HF space link: https://huggingface.co/spaces/akshathmangudi/gpt-4o-scicode

akshathmangudi · 2025-11-27T08:03:06Z

cc: @NathanHB would love your feedback on this.

NathanHB

Hey ! Looking nice :)
Only thing is that it only check if the code has been generated ?

i think you can also use what is there: https://github.com/scicode-bench/SciCode/blob/main/eval/inspect_ai/scicode.py

And give credit / ask them of course :)

NathanHB · 2025-12-04T15:43:24Z

src/lighteval/tasks/tasks/scicode.py

+class SciCodeMetric(SampleLevelComputation):
+    """Metric for SciCode code generation evaluation."""
+
+    def compute(self, model_response: ModelResponse, doc: Doc, **kwargs) -> dict:
+        """Check if code was generated."""
+        assert doc.specific is not None, "Doc specific field is required for scicode metric"
+
+        predictions = model_response.final_text
+
+        if not predictions:
+            return {"code_extracted": 0.0}
+
+        generated_code = extract_code(predictions[0])
+        code_extracted = 1.0 if generated_code and len(generated_code.strip()) > 0 else 0.0
+
+        return {"code_extracted": code_extracted}


you can remove the function taht are meant to be used with backends other than inspect-ai

akshathmangudi added 2 commits November 27, 2025 13:27

scicode integration

89b0502

Merge branch 'main' into akshath/issue-740

b68785d

NathanHB reviewed Dec 4, 2025

View reviewed changes

NathanHB added the new-task label Dec 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[EVAL] SciCode #1086

[EVAL] SciCode #1086

Uh oh!

akshathmangudi commented Nov 27, 2025

Uh oh!

akshathmangudi commented Nov 27, 2025

Uh oh!

NathanHB left a comment

Uh oh!

NathanHB Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[EVAL] SciCode #1086

Are you sure you want to change the base?

[EVAL] SciCode #1086

Uh oh!

Conversation

akshathmangudi commented Nov 27, 2025

Overview

Uh oh!

akshathmangudi commented Nov 27, 2025

Uh oh!

NathanHB left a comment

Choose a reason for hiding this comment

Uh oh!

NathanHB Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants