Skip to content

Conversation

@stephencge
Copy link

Summary

  • Adds new math_formal_lean resource server for Lean4 formal theorem proving
  • Implements /verify endpoint that compiles proofs via sandbox container and returns reward 1.0/0.0
  • Includes MiniF2F dataset (244 test problems) with NeMo-Skills aligned prompt format
  • Comprehensive test suite (31 tests)

Components

File Description
app.py Resource server with verify endpoint
sandbox_client.py HTTP client for Lean4 sandbox
proof_utils.py Proof extraction/building utilities
prepare_minif2f.py Dataset preparation script
README.md Documentation with licensing info

Test plan

  • Unit tests pass (31/31)
  • End-to-end test with ng_collect_rollouts (0.2 reward on 5 samples)
  • Tested with gpt-5.1-codex-max model
  • Pre-commit lint checks pass

🤖 Generated with Claude Code

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Implements a NeMo Gym environment for formal theorem proving:
- Resource server with /verify endpoint for proof compilation
- Sandbox client for Lean4 compilation via HTTP
- Proof utilities ported from NeMo-Skills (self-contained)
- MiniF2F dataset preparation script (244 test problems)
- Comprehensive test suite (31 tests)
- Prompt aligned with NeMo-Skills deepseek-prover-v2 format

The environment returns reward 1.0 for successful proof compilation
and 0.0 for failures (syntax errors, timeouts, sorry usage).

Signed-off-by: Stephen Ge <stepheng@nvidia.com>
@stephencge stephencge force-pushed the stepheng/lean-environment branch from 08715f9 to 5980d17 Compare January 8, 2026 03:32
"""Get or create the async HTTP client."""
if self._client is None:
self._client = httpx.AsyncClient(
limits=httpx.Limits(max_keepalive_connections=100, max_connections=100),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be set higher?

# limitations under the License.

"""HTTP client for communicating with Lean4 sandbox container."""

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a link to the reference sandbox implementation from skills, so that people can see what this is supposed to talk to?


predicted_proof = header + formal_statement + proof_part

elif answer_format == "lean4-statement":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this code path?



# Standard Lean4 header with common imports
LEAN4_HEADER = (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to allow customization here? For real datasets (not minif2f)?

if final_answer_key and final_answer_key in generation:
generation = generation.split(final_answer_key, 1)[1].strip()

languages = ["lean4", "lean3", "lean", ""]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove lean3?


"""Utilities for Lean4 proof processing and evaluation.

Ported from NeMo-Skills nemo_skills/code_execution/proof_utils.py and utils.py
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best to add a link

class MathFormalLeanResourcesServerConfig(BaseResourcesServerConfig):
sandbox_host: str = "127.0.0.1"
sandbox_port: int = 6000
compilation_timeout: float = 30.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this too low? I guess we don't want to go aggressive with timeouts for training, but curious if you have an estimation of how much time we'd be hitting timeouts on valid proofs with 30 seconds

@cmunley1
Copy link
Contributor

cmunley1 commented Jan 8, 2026

https://docs.nvidia.com/nemo/gym/latest/contribute/environments/new-environment.html#contribution-workflow

Can you share example training run and reward profiling if you have these?

stephencge and others added 2 commits January 8, 2026 22:31
Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>
- Add links to NeMo-Skills sandbox implementation in sandbox_client.py
- Add link to NeMo-Skills source in proof_utils.py docstring
- Remove unused LEAN4_HEADER constant and get_lean4_header function
- Remove lean3 from language detection (Lean4 only)
- Remove unused lean4-statement code path from build_lean4_proof
- Update tests to match simplified API

Signed-off-by: Stephen Ge <stephenge@nvidia.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
@stephencge stephencge force-pushed the stepheng/lean-environment branch from 362079b to 8f29154 Compare January 9, 2026 04:10
stephencge and others added 2 commits January 8, 2026 21:26
Adds prepare_numina.py to download and filter NuminaMath-LEAN dataset
from HuggingFace. Filters to ~4394 problems with:
- author == 'human'
- ground_truth_type in ['complete', 'statement']
- win_rate between 0.01 and 0.95

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Remove unused docstring_started variable flagged by ruff.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
resources_servers:
math_formal_lean:
entrypoint: app.py
sandbox_host: ${oc.env:LEAN_SANDBOX_HOST,127.0.0.1}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use NEMO_SKILS_SANDBOX_HOST for consistency with other uses of the sandbox? Allows us to only set it once

stephencge and others added 2 commits January 8, 2026 21:47
Rename sandbox environment variables from LEAN_SANDBOX_HOST/PORT to
NEMO_SKILLS_SANDBOX_HOST/PORT for consistency with NeMo-Skills.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Removing prepare_numina.py pending dataset usage approval.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants