Add math_formal_lean resource server for Lean4 proof verification #563

stephencge · 2026-01-08T03:21:15Z

Summary

Adds new math_formal_lean resource server for Lean4 formal theorem proving
Implements /verify endpoint that compiles proofs via sandbox container and returns reward 1.0/0.0
Includes MiniF2F dataset (244 test problems) with NeMo-Skills aligned prompt format
Comprehensive test suite (31 tests)

Components

File	Description
`app.py`	Resource server with verify endpoint
`sandbox_client.py`	HTTP client for Lean4 sandbox
`proof_utils.py`	Proof extraction/building utilities
`prepare_minif2f.py`	Dataset preparation script
`README.md`	Documentation with licensing info

Test plan

Unit tests pass (31/31)
End-to-end test with ng_collect_rollouts (0.2 reward on 5 samples)
Tested with gpt-5.1-codex-max model
Pre-commit lint checks pass

🤖 Generated with Claude Code

copy-pr-bot · 2026-01-08T03:21:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Implements a NeMo Gym environment for formal theorem proving: - Resource server with /verify endpoint for proof compilation - Sandbox client for Lean4 compilation via HTTP - Proof utilities ported from NeMo-Skills (self-contained) - MiniF2F dataset preparation script (244 test problems) - Comprehensive test suite (31 tests) - Prompt aligned with NeMo-Skills deepseek-prover-v2 format The environment returns reward 1.0 for successful proof compilation and 0.0 for failures (syntax errors, timeouts, sorry usage). Signed-off-by: Stephen Ge <stepheng@nvidia.com>

Kipok · 2026-01-08T17:15:08Z

resources_servers/math_formal_lean/sandbox_client.py

+        """Get or create the async HTTP client."""
+        if self._client is None:
+            self._client = httpx.AsyncClient(
+                limits=httpx.Limits(max_keepalive_connections=100, max_connections=100),


should this be set higher?

Kipok · 2026-01-08T17:15:44Z

resources_servers/math_formal_lean/sandbox_client.py

+# limitations under the License.
+
+"""HTTP client for communicating with Lean4 sandbox container."""
+


Could you add a link to the reference sandbox implementation from skills, so that people can see what this is supposed to talk to?

resources_servers/math_formal_lean/README.md

Kipok · 2026-01-08T17:20:29Z

resources_servers/math_formal_lean/proof_utils.py

+
+        predicted_proof = header + formal_statement + proof_part
+
+    elif answer_format == "lean4-statement":


do we need this code path?

Kipok · 2026-01-08T17:20:38Z

resources_servers/math_formal_lean/proof_utils.py

+
+
+# Standard Lean4 header with common imports
+LEAN4_HEADER = (


do we need to allow customization here? For real datasets (not minif2f)?

Kipok · 2026-01-08T17:21:14Z

resources_servers/math_formal_lean/proof_utils.py

+    if final_answer_key and final_answer_key in generation:
+        generation = generation.split(final_answer_key, 1)[1].strip()
+
+    languages = ["lean4", "lean3", "lean", ""]


should we remove lean3?

Kipok · 2026-01-08T17:21:34Z

resources_servers/math_formal_lean/proof_utils.py

+
+"""Utilities for Lean4 proof processing and evaluation.
+
+Ported from NeMo-Skills nemo_skills/code_execution/proof_utils.py and utils.py


best to add a link

Kipok · 2026-01-08T17:22:51Z

resources_servers/math_formal_lean/app.py

+class MathFormalLeanResourcesServerConfig(BaseResourcesServerConfig):
+    sandbox_host: str = "127.0.0.1"
+    sandbox_port: int = 6000
+    compilation_timeout: float = 30.0


is this too low? I guess we don't want to go aggressive with timeouts for training, but curious if you have an estimation of how much time we'd be hitting timeouts on valid proofs with 30 seconds

cmunley1 · 2026-01-08T21:59:14Z

https://docs.nvidia.com/nemo/gym/latest/contribute/environments/new-environment.html#contribution-workflow

Can you share example training run and reward profiling if you have these?

Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>

- Add links to NeMo-Skills sandbox implementation in sandbox_client.py - Add link to NeMo-Skills source in proof_utils.py docstring - Remove unused LEAN4_HEADER constant and get_lean4_header function - Remove lean3 from language detection (Lean4 only) - Remove unused lean4-statement code path from build_lean4_proof - Update tests to match simplified API Signed-off-by: Stephen Ge <stephenge@nvidia.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>

Adds prepare_numina.py to download and filter NuminaMath-LEAN dataset from HuggingFace. Filters to ~4394 problems with: - author == 'human' - ground_truth_type in ['complete', 'statement'] - win_rate between 0.01 and 0.95 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>

Remove unused docstring_started variable flagged by ruff. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>

gwarmstrong · 2026-01-09T05:39:29Z

resources_servers/math_formal_lean/configs/math_formal_lean.yaml

+  resources_servers:
+    math_formal_lean:
+      entrypoint: app.py
+      sandbox_host: ${oc.env:LEAN_SANDBOX_HOST,127.0.0.1}


Can you use NEMO_SKILS_SANDBOX_HOST for consistency with other uses of the sandbox? Allows us to only set it once

Rename sandbox environment variables from LEAN_SANDBOX_HOST/PORT to NEMO_SKILLS_SANDBOX_HOST/PORT for consistency with NeMo-Skills. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>

Removing prepare_numina.py pending dataset usage approval. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>

stephencge force-pushed the stepheng/lean-environment branch from 08715f9 to 5980d17 Compare January 8, 2026 03:32

Kipok reviewed Jan 8, 2026

View reviewed changes

stephencge and others added 2 commits January 8, 2026 22:31

Update resources_servers/math_formal_lean/README.md

b8289e9

Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>

stephencge force-pushed the stepheng/lean-environment branch from 362079b to 8f29154 Compare January 9, 2026 04:10

stephencge and others added 2 commits January 8, 2026 21:26

Fix unused variable in prepare_numina.py

2d2aa50

Remove unused docstring_started variable flagged by ruff. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>

gwarmstrong reviewed Jan 9, 2026

View reviewed changes

stephencge and others added 2 commits January 8, 2026 21:47

Use NEMO_SKILLS_SANDBOX_* env vars for consistency

21c90e7

Rename sandbox environment variables from LEAN_SANDBOX_HOST/PORT to NEMO_SKILLS_SANDBOX_HOST/PORT for consistency with NeMo-Skills. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>

Remove NuminaMath-LEAN dataset preparation script

232868d

Removing prepare_numina.py pending dataset usage approval. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add math_formal_lean resource server for Lean4 proof verification #563

Add math_formal_lean resource server for Lean4 proof verification #563

Uh oh!

stephencge commented Jan 8, 2026

Uh oh!

copy-pr-bot bot commented Jan 8, 2026

Uh oh!

Kipok Jan 8, 2026

Uh oh!

Kipok Jan 8, 2026

Uh oh!

Uh oh!

Kipok Jan 8, 2026

Uh oh!

Kipok Jan 8, 2026

Uh oh!

Kipok Jan 8, 2026

Uh oh!

Kipok Jan 8, 2026

Uh oh!

Kipok Jan 8, 2026

Uh oh!

cmunley1 commented Jan 8, 2026

Uh oh!

gwarmstrong Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		# limitations under the License.

		"""HTTP client for communicating with Lean4 sandbox container."""


		predicted_proof = header + formal_statement + proof_part

		elif answer_format == "lean4-statement":


		"""Utilities for Lean4 proof processing and evaluation.

		Ported from NeMo-Skills nemo_skills/code_execution/proof_utils.py and utils.py

Add math_formal_lean resource server for Lean4 proof verification #563

Are you sure you want to change the base?

Add math_formal_lean resource server for Lean4 proof verification #563

Uh oh!

Conversation

stephencge commented Jan 8, 2026

Summary

Components

Test plan

Uh oh!

copy-pr-bot bot commented Jan 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmunley1 commented Jan 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants