ci(alloy): add model checks and caching (stacked) #852

lmeyerov · 2025-11-26T01:11:59Z

Stacked on feat/issue-837-cudf-hop-executor. Adds Alloy model coverage for GFQL F/B/F with WHERE lowerings (scenarios + optional multi-chain full-scope) plus ghcr caching for the Alloy image.

Status: Experimental

This Alloy model is experimental infrastructure for formal verification of GFQL internals. It currently verifies WHERE clause lowering to per-alias value summaries (Eq/Neq/Lt/Gt via min/max).

What it verifies now

WHERE clause lowering logic (equality via bitsets, inequality via min/max)
Path semantics ↔ Set semantics equivalence for fixed-length chains
Various graph topologies: fan-out, fan-in, cycles, parallel edges, disconnected

What it does NOT verify (and where bugs actually occurred)

The 5 bugs found during executor development (PR #846) were all in traversal mechanics, not WHERE filtering:

Backward traversal join direction
Empty set short-circuit
Node source for non-adjacent aliases
Multi-hop path tracing
Edge direction handling

These are caught by Python parity tests (reference enumerator vs native executor), not the Alloy model.

Next steps for bang-for-buck

Rather than extending Alloy to model DataFrame join mechanics, higher value options:

Cypher TCK conformance tests - test user-facing behavior with openCypher existing test suite
Property-based testing - Hypothesis-style fuzzing of graph patterns
Keep Alloy as design documentation, not primary verification

See issue #871 for the full roadmap discussion.

P0/P1 tests for cuDF same-path executor with hop range features. Tests added: - WHERE respected after min_hops backtracking (xfail #872) - Reverse direction + hop range + WHERE (xfail #872) - Non-adjacent alias WHERE (xfail #872) - Oracle vs cuDF parity comprehensive (xfail #872) - Multi-hop edge WHERE filtering (xfail #872) - Output slicing + WHERE (PASS) - label_seeds + output_min_hops (PASS) - Multiple WHERE + mixed hop ranges (xfail #872) 6 tests marked xfail documenting multi-hop backward prune bugs. 2 tests pass verifying output slicing and label_seeds work correctly. See issue #872 for bug details. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…cutor The same-path executor (used by both pandas and cuDF backends) had a correctness bug where WHERE clauses were silently skipped for multi-hop edges (min_hops/max_hops > 1). This could return incorrect query results regardless of whether using pandas or cuDF. Changes: - Add `_filter_multihop_by_where()` to handle WHERE for multi-hop edges - Identify first/last hop edges using hop labels - Cross-join start/end pairs and apply WHERE to filter valid paths - Include intermediate nodes in `_materialize_filtered()` for multi-hop Tests updated: - Remove xfail from 3 tests that now pass: - test_reverse_direction_where_semantics - test_oracle_cudf_parity_comprehensive - test_multi_hop_edge_where_filtering - 3 tests remain xfail for known oracle parity bugs (see #872) Fixes part of #872. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The executor works with both pandas and cuDF DataFrames, so the name was misleading. Renamed for clarity: - cudf_executor.py → df_executor.py - CuDFSamePathExecutor → DFSamePathExecutor - test_cudf_executor_inputs.py → test_df_executor_inputs.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…cation Session 3-5 bug fixes: - Fix multi-hop path tracing in _apply_non_adjacent_where_post_prune - Fix _filter_multihop_by_where hop column handling - Fix reverse edge handling in _filter_edges_by_clauses - Fix single-hop edge persistence after WHERE filtering - Fix equality filtering when left_col == right_col (merge suffix) - Fix edge filtering in _re_propagate_backward for multi-hop edges - Add _filter_multihop_edges_by_endpoints helper for proper path tracing - Add _find_multihop_start_nodes helper for backward propagation - Add comprehensive undirected edge support throughout executor Test amplification (37 new tests): - 8 single-hop topology + cycle tests - 3 unfiltered start tests (converted from xfail) - 4 P0 reverse + multi-hop tests - 3 P0 multiple starts tests - 6 P1 operators × single-hop tests - 6 P1 operators × multi-hop tests - 2 P1 undirected + multi-hop tests - 3 P1 mixed direction chain tests - 4 P2 longer path tests - 6 P2 edge case tests All 78 tests pass, 2 skipped, 1 xfail (oracle limitation). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace Python set/dict intermediates with DataFrame operations throughout the same-path executor to enable GPU (cuDF) compatibility: Round 1 (prior commit): - Remove BFS/DFS while loops - Replace for...zip() iterations with merge operations - Remove adjacency dict lookups Round 2 (this commit): - Replace dict(zip()) with DataFrame slice+rename - Replace set(tolist()) tracking with DataFrame anti-joins - Use pd.concat() + drop_duplicates() instead of set unions - Use merge(..., indicator=True) for "not seen" logic Key changes: - _apply_non_adjacent_where_post_prune: vectorized value lookups - _filter_multihop_edges_by_endpoints: DataFrame-based reachability - _find_multihop_start_nodes: DataFrame anti-join for visited tracking - _filter_multihop_by_where: DataFrame extraction for start/end nodes - _materialize_filtered: DataFrame concat for allowed node collection Remaining boundary issues documented in plan.md for future Round 3 (PathState refactor) and Round 4 (pay-as-you-go complexity). All 91 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add profiling infrastructure to measure executor performance across different scenarios: - Varying graph sizes (100 to 10K nodes) - Simple vs multi-hop chains - With and without WHERE clauses - Sparse vs dense graphs Initial findings: - Multi-hop ~2x slower than simple queries (95-110ms vs 40-50ms) - Graph size 100→10K only adds ~10ms (fixed costs dominate) - WHERE clauses add minimal overhead - Bottleneck is likely executor setup, not data processing This helps inform optimization priorities for Round 3 (_PathState refactor) and Round 4 (pay-as-you-go complexity). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add cProfile-based analysis to identify actual hotspots in query execution. Key findings: - Round 3 (_PathState refactor) is LOW PRIORITY - set operations are not the bottleneck, df_executor functions don't appear in top hotspots - Oracle enumeration takes 38% of same-path executor time - Legacy hop.py takes 75% of time in simple queries - Multihop is FASTER than simple for large graphs (less data returned) - Materialization (large result sets) dominates, not filtering Updated profiling scenarios to include 100K+ nodes. Extended plan.md with detailed cProfile results and insights. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add early return for empty allowed_nodes sets in _materialize_filtered - Add early return for empty edges in _filter_edges_by_clauses - Use original graph nodes in _apply_non_adjacent_where_post_prune - Fix _find_multihop_start_nodes backward traversal join logic - Add _run_native method as alias for _run_gpu - Add 10 new tests for impossible/contradictory constraints 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove unused variable assignments (relevant_node_indices, edge_id_col, max_hop) - Move binary operators to start of line (W504 compliance) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add type annotations for stack variable in enumerator.py - Add type: ignore comments for iterrows() which returns ambiguous types - Add isinstance(edge_op, ASTEdge) checks to narrow types before accessing ASTEdge attributes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Clarifies what IS and is NOT formally verified in Alloy model - Hop ranges approximated by unrolling (not fully verified) - Output slicing treated as post-filter - References Python parity tests for unverified features - Adds PLAN-846-852-feature-composition.md tracking document See issues #871 (roadmap) and #872 (multi-hop bugs) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add ContradictoryWhere predicate to detect impossible constraint pairs (e.g., a.v < c.v AND a.v > c.v) - Add ContradictoryWhereEmpty assertion verifying empty output for contradictions - Update README with bugs found in PR #846 that inform future verification - Reference issue #871 for P1/P2 verification roadmap items 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Attempted to add ContradictoryWhere assertion but found model's value semantics don't cleanly support it (Eq checks intersection while Neq checks empty intersection - these don't produce expected contradictions). - Replace failed assertion with explanatory comment - Update README to note limitation with pointer to Python test coverage - TestImpossibleConstraints covers 10 contradictory constraint scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

lmeyerov added 22 commits December 24, 2025 01:47

feat: scaffold cudf executor skeleton

d37ca93

feat: wire same-path plan into cudf executor

051a936

feat: add gfql where metadata and planner

7ade745

feat: implement cudf executor forward pass

ce75193

test: add cudf forward parity cases

896c08a

docs: copy issue 837 plan into impl folder

770103d

chore: remove tracked cudf executor plan

35224b2

feat: add oracle fallback for cudf same-path executor

1f4d18c

chore: gate cudf same-path executor and add strict-mode test

fa49df7

chore: document cuDF same-path fallback gating

da10935

feat: add same-path pruning for cudf executor

31a67d5

feat: route cudf chains with WHERE to same-path executor

ff701e8

feat: enforce same-path summaries in cudf executor

e787bde

fix(gfql): preserve edge filters in cudf same-path

94a63ba

chore(gfql): fix same-path typing and mypy config

352a418

chore(gfql): clean chain typing imports

297d4ff

chore(gfql): silence dtype comparisons for mypy 3.8

22c3276

test(gfql): cover same-path cycles, branches, edge filters, cudf

59f2909

test(gfql): compress same-path topology coverage

0ae2c1d

chore(gfql): tighten inequality mask

d2cdbfb

test(gfql): add dispatch same-path dict case

27d48fe

test(gfql): add chain/list dispatch same-path parity

4174258

lmeyerov force-pushed the feat/issue-837-cudf-hop-executor branch from 4cf85dd to 4174258 Compare December 24, 2025 09:52

lmeyerov force-pushed the feat/issue-838-alloy-fbf-where branch from 8cc2f4c to 4548ef4 Compare December 24, 2025 09:54

fix(gfql): import same_path_types from gfql

2976cff

lmeyerov force-pushed the feat/issue-838-alloy-fbf-where branch from 4548ef4 to 3b9746c Compare December 24, 2025 09:57

fix(gfql): add package init and clean mypy config

7d40694

lmeyerov force-pushed the feat/issue-838-alloy-fbf-where branch from 3b9746c to 5e11f5b Compare December 24, 2025 16:25

fix(gfql): add ref package init

564edf1

lmeyerov force-pushed the feat/issue-838-alloy-fbf-where branch from 5e11f5b to 6de43c3 Compare December 24, 2025 16:29

This was referenced Dec 26, 2025

Meta: GFQL Testing & Verification Roadmap #871

Open

Fix multi-hop + WHERE backward prune bugs in cuDF executor #872

Open

lmeyerov and others added 23 commits December 26, 2025 06:42

docs(plan): add Session 9 summary for CI fixes and verification update

520eaa2

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chore: remove plan.md from repo

df156d4

fix(gfql): correct mypy ignore codes for iterrows

baefa76

fix(gfql): use pd.Index for column assignment to satisfy py38 mypy

3d0fe0c

chore(gfql): add initial alloy f/b/f where model

9d48714

chore(gfql): refine alloy model where lowering

ec2d516

ci(alloy): add scenario checks and coverage

00f082b

ci(alloy): add optional multi-chain full-scope

064df32

ci(alloy): pull/push ghcr cache for checks

4ab7310

ci(alloy): gate full scopes and document mapping

7a291f1

docs(alloy): add README and mapping notes

baf9d76

docs(alloy): note hop range modeling limits

8332b30

lmeyerov force-pushed the feat/issue-838-alloy-fbf-where branch from 5429779 to e1a7cec Compare December 29, 2025 01:20

lmeyerov and others added 2 commits December 28, 2025 17:22

lmeyerov force-pushed the feat/issue-837-cudf-hop-executor branch 2 times, most recently from b797e4e to 5f0f370 Compare January 2, 2026 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci(alloy): add model checks and caching (stacked) #852

ci(alloy): add model checks and caching (stacked) #852

Uh oh!

lmeyerov commented Nov 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ci(alloy): add model checks and caching (stacked) #852

Are you sure you want to change the base?

ci(alloy): add model checks and caching (stacked) #852

Uh oh!

Conversation

lmeyerov commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Status: Experimental

What it verifies now

What it does NOT verify (and where bugs actually occurred)

Next steps for bang-for-buck

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lmeyerov commented Nov 26, 2025 •

edited

Loading