-
Notifications
You must be signed in to change notification settings - Fork 218
ci(alloy): add model checks and caching (stacked) #852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lmeyerov
wants to merge
51
commits into
feat/issue-837-cudf-hop-executor
Choose a base branch
from
feat/issue-838-alloy-fbf-where
base: feat/issue-837-cudf-hop-executor
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
ci(alloy): add model checks and caching (stacked) #852
lmeyerov
wants to merge
51
commits into
feat/issue-837-cudf-hop-executor
from
feat/issue-838-alloy-fbf-where
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4cf85dd to
4174258
Compare
8cc2f4c to
4548ef4
Compare
4548ef4 to
3b9746c
Compare
3b9746c to
5e11f5b
Compare
5e11f5b to
6de43c3
Compare
This was referenced Dec 26, 2025
P0/P1 tests for cuDF same-path executor with hop range features. Tests added: - WHERE respected after min_hops backtracking (xfail #872) - Reverse direction + hop range + WHERE (xfail #872) - Non-adjacent alias WHERE (xfail #872) - Oracle vs cuDF parity comprehensive (xfail #872) - Multi-hop edge WHERE filtering (xfail #872) - Output slicing + WHERE (PASS) - label_seeds + output_min_hops (PASS) - Multiple WHERE + mixed hop ranges (xfail #872) 6 tests marked xfail documenting multi-hop backward prune bugs. 2 tests pass verifying output slicing and label_seeds work correctly. See issue #872 for bug details. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…cutor The same-path executor (used by both pandas and cuDF backends) had a correctness bug where WHERE clauses were silently skipped for multi-hop edges (min_hops/max_hops > 1). This could return incorrect query results regardless of whether using pandas or cuDF. Changes: - Add `_filter_multihop_by_where()` to handle WHERE for multi-hop edges - Identify first/last hop edges using hop labels - Cross-join start/end pairs and apply WHERE to filter valid paths - Include intermediate nodes in `_materialize_filtered()` for multi-hop Tests updated: - Remove xfail from 3 tests that now pass: - test_reverse_direction_where_semantics - test_oracle_cudf_parity_comprehensive - test_multi_hop_edge_where_filtering - 3 tests remain xfail for known oracle parity bugs (see #872) Fixes part of #872. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The executor works with both pandas and cuDF DataFrames, so the name was misleading. Renamed for clarity: - cudf_executor.py → df_executor.py - CuDFSamePathExecutor → DFSamePathExecutor - test_cudf_executor_inputs.py → test_df_executor_inputs.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…cation Session 3-5 bug fixes: - Fix multi-hop path tracing in _apply_non_adjacent_where_post_prune - Fix _filter_multihop_by_where hop column handling - Fix reverse edge handling in _filter_edges_by_clauses - Fix single-hop edge persistence after WHERE filtering - Fix equality filtering when left_col == right_col (merge suffix) - Fix edge filtering in _re_propagate_backward for multi-hop edges - Add _filter_multihop_edges_by_endpoints helper for proper path tracing - Add _find_multihop_start_nodes helper for backward propagation - Add comprehensive undirected edge support throughout executor Test amplification (37 new tests): - 8 single-hop topology + cycle tests - 3 unfiltered start tests (converted from xfail) - 4 P0 reverse + multi-hop tests - 3 P0 multiple starts tests - 6 P1 operators × single-hop tests - 6 P1 operators × multi-hop tests - 2 P1 undirected + multi-hop tests - 3 P1 mixed direction chain tests - 4 P2 longer path tests - 6 P2 edge case tests All 78 tests pass, 2 skipped, 1 xfail (oracle limitation). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace Python set/dict intermediates with DataFrame operations throughout the same-path executor to enable GPU (cuDF) compatibility: Round 1 (prior commit): - Remove BFS/DFS while loops - Replace for...zip() iterations with merge operations - Remove adjacency dict lookups Round 2 (this commit): - Replace dict(zip()) with DataFrame slice+rename - Replace set(tolist()) tracking with DataFrame anti-joins - Use pd.concat() + drop_duplicates() instead of set unions - Use merge(..., indicator=True) for "not seen" logic Key changes: - _apply_non_adjacent_where_post_prune: vectorized value lookups - _filter_multihop_edges_by_endpoints: DataFrame-based reachability - _find_multihop_start_nodes: DataFrame anti-join for visited tracking - _filter_multihop_by_where: DataFrame extraction for start/end nodes - _materialize_filtered: DataFrame concat for allowed node collection Remaining boundary issues documented in plan.md for future Round 3 (PathState refactor) and Round 4 (pay-as-you-go complexity). All 91 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add profiling infrastructure to measure executor performance across different scenarios: - Varying graph sizes (100 to 10K nodes) - Simple vs multi-hop chains - With and without WHERE clauses - Sparse vs dense graphs Initial findings: - Multi-hop ~2x slower than simple queries (95-110ms vs 40-50ms) - Graph size 100→10K only adds ~10ms (fixed costs dominate) - WHERE clauses add minimal overhead - Bottleneck is likely executor setup, not data processing This helps inform optimization priorities for Round 3 (_PathState refactor) and Round 4 (pay-as-you-go complexity). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add cProfile-based analysis to identify actual hotspots in query execution. Key findings: - Round 3 (_PathState refactor) is LOW PRIORITY - set operations are not the bottleneck, df_executor functions don't appear in top hotspots - Oracle enumeration takes 38% of same-path executor time - Legacy hop.py takes 75% of time in simple queries - Multihop is FASTER than simple for large graphs (less data returned) - Materialization (large result sets) dominates, not filtering Updated profiling scenarios to include 100K+ nodes. Extended plan.md with detailed cProfile results and insights. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add early return for empty allowed_nodes sets in _materialize_filtered - Add early return for empty edges in _filter_edges_by_clauses - Use original graph nodes in _apply_non_adjacent_where_post_prune - Fix _find_multihop_start_nodes backward traversal join logic - Add _run_native method as alias for _run_gpu - Add 10 new tests for impossible/contradictory constraints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove unused variable assignments (relevant_node_indices, edge_id_col, max_hop) - Move binary operators to start of line (W504 compliance) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add type annotations for stack variable in enumerator.py - Add type: ignore comments for iterrows() which returns ambiguous types - Add isinstance(edge_op, ASTEdge) checks to narrow types before accessing ASTEdge attributes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Clarifies what IS and is NOT formally verified in Alloy model - Hop ranges approximated by unrolling (not fully verified) - Output slicing treated as post-filter - References Python parity tests for unverified features - Adds PLAN-846-852-feature-composition.md tracking document See issues #871 (roadmap) and #872 (multi-hop bugs) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5429779 to
e1a7cec
Compare
- Add ContradictoryWhere predicate to detect impossible constraint pairs (e.g., a.v < c.v AND a.v > c.v) - Add ContradictoryWhereEmpty assertion verifying empty output for contradictions - Update README with bugs found in PR #846 that inform future verification - Reference issue #871 for P1/P2 verification roadmap items 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Attempted to add ContradictoryWhere assertion but found model's value semantics don't cleanly support it (Eq checks intersection while Neq checks empty intersection - these don't produce expected contradictions). - Replace failed assertion with explanatory comment - Update README to note limitation with pointer to Python test coverage - TestImpossibleConstraints covers 10 contradictory constraint scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
b797e4e to
5f0f370
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stacked on feat/issue-837-cudf-hop-executor. Adds Alloy model coverage for GFQL F/B/F with WHERE lowerings (scenarios + optional multi-chain full-scope) plus ghcr caching for the Alloy image.
Status: Experimental
This Alloy model is experimental infrastructure for formal verification of GFQL internals. It currently verifies WHERE clause lowering to per-alias value summaries (Eq/Neq/Lt/Gt via min/max).
What it verifies now
What it does NOT verify (and where bugs actually occurred)
The 5 bugs found during executor development (PR #846) were all in traversal mechanics, not WHERE filtering:
These are caught by Python parity tests (reference enumerator vs native executor), not the Alloy model.
Next steps for bang-for-buck
Rather than extending Alloy to model DataFrame join mechanics, higher value options:
See issue #871 for the full roadmap discussion.