Skip to content

Commit 093f418

Browse files
committed
feat: enforce file paths in agentic outputs with JSON schema validation
Agentic tools were returning abstract component references without file locations, making it impossible for downstream agents to drill into code. This change enforces structured outputs with required file_path fields using JSON Schema validation at the LLM provider level. - Add comprehensive schemas for all 7 agentic tools in agentic_schemas.rs - Each schema combines freeform analysis field with structured arrays - FileLocation type enforces name, file_path, and optional line_number - Schemas passed to LLM via response_format in GenerationConfig - AutoAgents adapter converts StructuredOutputFormat to ResponseFormat - MCP handler parses and surfaces structured_output in responses - Test script updated to extract and display file locations Benefits: File paths are mandatory, agents can navigate to exact code, consistent data structure enables better agent-to-agent collaboration.
1 parent 572daa6 commit 093f418

File tree

12 files changed

+697
-22
lines changed

12 files changed

+697
-22
lines changed

CHANGELOG.md

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,37 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3636
- **Schema**: HNSW index for `embedding_768` column with EFC 200, M 16
3737
- **Auto-detection**: Automatic column selection based on embedding dimension
3838

39-
#### **File Location Requirements in Agent Outputs**
40-
- **All EXPLORATORY prompts** now require file locations in responses
39+
#### **Structured Output Enforcement with JSON Schemas**
40+
- **JSON schema enforcement** for all 7 agentic tools with **required file paths**
41+
- **Schema-driven outputs**: LLM cannot return response without file locations
42+
- **New module**: `codegraph-ai/src/agentic_schemas.rs` with comprehensive schemas:
43+
- `CodeSearchOutput`: analysis + components[] + patterns[]
44+
- `DependencyAnalysisOutput`: analysis + components[] + dependencies[] + circular_dependencies[]
45+
- `CallChainOutput`: analysis + entry_point + call_chain[] + decision_points[]
46+
- `ArchitectureAnalysisOutput`: analysis + layers[] + hub_nodes[] + coupling_metrics[]
47+
- `APISurfaceOutput`: analysis + endpoints[] + usage_patterns[]
48+
- `ContextBuilderOutput`: comprehensive context with all analysis dimensions
49+
- `SemanticQuestionOutput`: answer + evidence[] + related_components[]
50+
- **Required fields**: Every component must include `name`, `file_path`, and optional `line_number`
51+
- **Provider integration**:
52+
- Added `response_format` field to `GenerationConfig`
53+
- OpenAI compatible providers send JSON schema to LLM API
54+
- AutoAgents adapter converts `StructuredOutputFormat` to CodeGraph `ResponseFormat`
55+
- **Hybrid output**: Combines freeform `analysis` field with structured arrays
56+
- **MCP handler**: Parses structured JSON and surfaces in `structured_output` field
57+
- **Benefits**:
58+
- File paths are **mandatory** - no more abstract references
59+
- Downstream tools can navigate directly to relevant code
60+
- Consistent data structure for programmatic consumption
61+
- Better agent-to-agent collaboration with actionable locations
62+
63+
#### **File Location Requirements in Agent Outputs (Deprecated)**
64+
- **Superseded by**: Structured output enforcement with JSON schemas (above)
65+
- **Legacy prompt updates**: All EXPLORATORY prompts requested file locations (now enforced)
4166
- **Format**: `ComponentName in path/to/file.rs:line_number`
4267
- **Example**: "ConfigLoader in src/config/loader.rs:42" instead of just "ConfigLoader"
4368
- **6 prompts updated**: code_search, dependency_analysis, call_chain, architecture, context_builder, semantic_question, api_surface
44-
- **Enables**: Downstream agents can drill into specific files for detailed analysis
69+
- **Migration**: Prompts now work in conjunction with schema enforcement
4570

4671
### 🐛 **Fixed - Critical Database Persistence Bugs**
4772

@@ -141,6 +166,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
141166
- **Ollama support**: Ollama provider now respects chunking configuration
142167
- **Jina unchanged**: Still uses `JINA_MAX_TOKENS` (provider-specific)
143168

169+
### 📦 **Dependencies**
170+
171+
#### **Added**
172+
- **schemars** (workspace): JSON schema generation for structured LLM outputs
173+
- Used in `codegraph-ai` for agentic schema definitions
174+
- Enables compile-time schema validation
175+
- Auto-generates JSON Schema from Rust types
176+
144177
### 📚 **Documentation**
145178
- **GraphFunctions enrichment plan**: Comprehensive plan saved to `.ouroboros/plans/graphfunctions-enrichment-20251118.md`
146179
- Schema alignment recommendations

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

crates/codegraph-ai/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ thiserror = { workspace = true }
1717
anyhow = { workspace = true }
1818
serde = { workspace = true, features = ["derive"] }
1919
serde_json = { workspace = true }
20+
schemars = { workspace = true }
2021
tracing = { workspace = true }
2122
parking_lot = { workspace = true }
2223
uuid = { workspace = true }

0 commit comments

Comments
 (0)