-
Notifications
You must be signed in to change notification settings - Fork 40
Add math_formal_lean resource server for Lean4 proof verification #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements a NeMo Gym environment for formal theorem proving: - Resource server with /verify endpoint for proof compilation - Sandbox client for Lean4 compilation via HTTP - Proof utilities ported from NeMo-Skills (self-contained) - MiniF2F dataset preparation script (244 test problems) - Comprehensive test suite (31 tests) - Prompt aligned with NeMo-Skills deepseek-prover-v2 format The environment returns reward 1.0 for successful proof compilation and 0.0 for failures (syntax errors, timeouts, sorry usage). Signed-off-by: Stephen Ge <stepheng@nvidia.com>
08715f9 to
5980d17
Compare
| """Get or create the async HTTP client.""" | ||
| if self._client is None: | ||
| self._client = httpx.AsyncClient( | ||
| limits=httpx.Limits(max_keepalive_connections=100, max_connections=100), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be set higher?
| # limitations under the License. | ||
|
|
||
| """HTTP client for communicating with Lean4 sandbox container.""" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a link to the reference sandbox implementation from skills, so that people can see what this is supposed to talk to?
|
|
||
| predicted_proof = header + formal_statement + proof_part | ||
|
|
||
| elif answer_format == "lean4-statement": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this code path?
|
|
||
|
|
||
| # Standard Lean4 header with common imports | ||
| LEAN4_HEADER = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to allow customization here? For real datasets (not minif2f)?
| if final_answer_key and final_answer_key in generation: | ||
| generation = generation.split(final_answer_key, 1)[1].strip() | ||
|
|
||
| languages = ["lean4", "lean3", "lean", ""] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove lean3?
|
|
||
| """Utilities for Lean4 proof processing and evaluation. | ||
|
|
||
| Ported from NeMo-Skills nemo_skills/code_execution/proof_utils.py and utils.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
best to add a link
| class MathFormalLeanResourcesServerConfig(BaseResourcesServerConfig): | ||
| sandbox_host: str = "127.0.0.1" | ||
| sandbox_port: int = 6000 | ||
| compilation_timeout: float = 30.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this too low? I guess we don't want to go aggressive with timeouts for training, but curious if you have an estimation of how much time we'd be hitting timeouts on valid proofs with 30 seconds
|
Can you share example training run and reward profiling if you have these? |
Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>
- Add links to NeMo-Skills sandbox implementation in sandbox_client.py - Add link to NeMo-Skills source in proof_utils.py docstring - Remove unused LEAN4_HEADER constant and get_lean4_header function - Remove lean3 from language detection (Lean4 only) - Remove unused lean4-statement code path from build_lean4_proof - Update tests to match simplified API Signed-off-by: Stephen Ge <stephenge@nvidia.com> Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>
362079b to
8f29154
Compare
Adds prepare_numina.py to download and filter NuminaMath-LEAN dataset from HuggingFace. Filters to ~4394 problems with: - author == 'human' - ground_truth_type in ['complete', 'statement'] - win_rate between 0.01 and 0.95 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Remove unused docstring_started variable flagged by ruff. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>
| resources_servers: | ||
| math_formal_lean: | ||
| entrypoint: app.py | ||
| sandbox_host: ${oc.env:LEAN_SANDBOX_HOST,127.0.0.1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use NEMO_SKILS_SANDBOX_HOST for consistency with other uses of the sandbox? Allows us to only set it once
Rename sandbox environment variables from LEAN_SANDBOX_HOST/PORT to NEMO_SKILLS_SANDBOX_HOST/PORT for consistency with NeMo-Skills. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Removing prepare_numina.py pending dataset usage approval. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Summary
math_formal_leanresource server for Lean4 formal theorem proving/verifyendpoint that compiles proofs via sandbox container and returns reward 1.0/0.0Components
app.pysandbox_client.pyproof_utils.pyprepare_minif2f.pyREADME.mdTest plan
ng_collect_rollouts(0.2 reward on 5 samples)🤖 Generated with Claude Code