-
Notifications
You must be signed in to change notification settings - Fork 0
feat(#6): add research supervisor and its reports documentation agent and context agent #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…t and context agent
|
Yaay! Being in the new-world, I'm going to try a new-world approach! This should add learning to new Claude functionality, and speed up our understanding and implementation. Let's try a new PR review workflow, and we can review if it's helpful and speedy.
I've set up two Claude Skills: I'm finding Here we gooo |
Hareet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are moving along!! 🚂 🚂
Some minor feedback with in-line comments, and design questions here:
Design questions:
- How are we currently feeling about LangGraph? Is it overkill for 4-5 agents with 2-3 supervisors? Should we wrap Agent SDK calls directly and not worry about node state. Do you feel comfortable enough on this path?
agents/documentation-search-agent.ts
I'm imagining we'll need a few llm calls here. Potential workflow: our code calls MCP, retrieves list of sources or results to ask_question. Our code makes an LLM call along with agent-memory, and then we tell it to concise it down.
llm models:
We want to be able to use anything. Is that a plan to refactor later, or should we pull it out into a module now?
LLM API calls vs Agent SDKs:
API calls are a bit lower level and might be missing a ton of the system prompt that are injected with Claude Code that make it a viable coding assistant. How do we imagine handling that without getting locked into a vendor llm, and without re-inventing a functional assistant while ensuring our assistant isn't worse performing than provided agent.
TODO and Planning features in Claude Code:
Do we know if LangGraph handles this internally? Those are some of the core features that make Claude Code multi agents work better. Sharing todo, planning, and verification lists. Does that occur with raw api calls? I see it at the end of the Supervisor phase, but should it be added for some of the other api calls?
Domain Inference:
This seems highly important and I'm worried about the difficulty of getting this correct, or this value guiding the agents down a wild path. What's your opinion on leaving this out of scope for now until we manually have mapped 50+ tickets? And then we can test to see how accurate our domain inference would be, or do you see that happening alongside immediately at release?
Human-in-the-loop:
From the research-supervisor.ts, we should present the plan and have us read over it, modify the plan file and then agree to continue.
flowchart TD
subgraph Input["INPUT LAYER"]
TICKET_FILE["Ticket.md<br/>─────────────<br/>---<br/>title: Feature X<br/>type: feature<br/>priority: high<br/>domain: contacts<br/>---<br/>## Description<br/>..."]
end
subgraph Parsing["PARSING LAYER"]
TICKET_FILE --> PARSER["parseTicketFile()"]
PARSER --> READ["fs.readFileSync(path)"]
READ --> EXTRACT_FM["extractFrontmatter(content)"]
EXTRACT_FM --> YAML_PARSE["parseSimpleYAML()"]
YAML_PARSE --> META["metadata: {<br/> title, type,<br/> priority, domain<br/>}"]
EXTRACT_FM --> MD_BODY["markdown body"]
MD_BODY --> EXTRACT_DESC["extractSection('Description')"]
MD_BODY --> EXTRACT_REQ["extractBulletList('Requirements')"]
MD_BODY --> EXTRACT_AC["extractBulletList('Acceptance Criteria')"]
MD_BODY --> EXTRACT_COMP["extractCodeItems('Technical Context')"]
MD_BODY --> EXTRACT_REF["extractURLs('References')"]
end
subgraph IssueTemplate["IssueTemplate STRUCTURE"]
META & EXTRACT_DESC & EXTRACT_REQ & EXTRACT_AC & EXTRACT_COMP & EXTRACT_REF --> ISSUE_OBJ["IssueTemplate {<br/> issue: {<br/> title: string<br/> type: 'feature' | 'bug'<br/> priority: 'high' | 'medium' | 'low'<br/> description: string<br/> requirements: string[]<br/> acceptance_criteria: string[]<br/> technical_context: {<br/> domain: CHTDomain<br/> components: string[]<br/> }<br/> }<br/>}"]
end
subgraph Enrichment["ENRICHMENT LAYER"]
ISSUE_OBJ --> DOMAIN_CHECK{"domain<br/>specified?"}
DOMAIN_CHECK -->|Yes| ENRICHED["Enriched IssueTemplate"]
DOMAIN_CHECK -->|No| INFER["inferDomainAndComponents()"]
INFER --> LLM_PROMPT["Build LLM Prompt<br/>with issue description"]
LLM_PROMPT --> CLAUDE["Claude API<br/>temperature=0.2"]
CLAUDE --> JSON_EXTRACT["Extract JSON from response"]
JSON_EXTRACT --> MERGE["Merge inferred<br/>domain + components"]
MERGE --> ENRICHED
end
subgraph StateInit["STATE INITIALIZATION"]
ENRICHED --> INIT_STATE["ResearchState {<br/> messages: []<br/> issue: IssueTemplate<br/> researchFindings: null<br/> contextAnalysis: null<br/> orchestrationPlan: null<br/> currentPhase: 'init'<br/> errors: []<br/>}"]
end
style TICKET_FILE fill:#fff3e0
style ISSUE_OBJ fill:#e3f2fd
style ENRICHED fill:#e8f5e9
style INIT_STATE fill:#f3e5f5
Our parsing layer using custom parses looks like it could get out of hand. Do you think we should go with js-yaml? ticket-parser.ts and context-loader.ts
|
Using comprehensive-review agents, I found this helpful to guide me. I wanted to place this here for others, and if you see any obvious corrections. |
LangGraph
LangGraph is justified because of the feedback loops. Without cycles I'd agree it's overkill, but human rejection and QA failures need to route back. That's what graphs are for. I'm comfortable continuing. MCP / Kapa.ai Workflow
Use LLM Models
Agree we should do it now, not later. I've already built this as part of the dev supervisor work, an LLM abstraction with interface + factory pattern. Claude is fully implemented, OpenAI/Gemini are stubbed for when we need them. Will be in the next PR for review. LLM API calls vs Agent SDKs
I think we just build it out piece by piece. The thing is, we're not trying to build a general purpose coding assistant. Our agents do narrow, CHT specific tasks. So we can write focused prompts that leverage context from agent-memory that generic assistants wouldn't have anyway. Plus we're mostly dealing with structured outputs, not open ended "write me a feature" stuff. And with human validation checkpoints, we can catch what doesn't work and iterate. It won't be perfect out of the gate, but we'll refine as we go. TODO and Planning features
LangGraph handles state sharing between agents. When one finishes, the next sees its output. But it doesn't do the todo/planning stuff that Claude Code does internally. Right now our supervisors produce plans, but individual agents just execute. For simpler agents like doc search, that's fine. For code generation, we might benefit from having the agent explicitly plan before generating. This could help with reasoning and make it easier to debug when things go wrong. Worth considering for the coding and test agents. Domain Inference
Yes, I think it would be a good idea to have the domain as a required field for a good few tickets until we solve a few of them. Once we have context built up we can then have it as optional. Human-in-the-loop
Yeah, that is a good idea. However, we should just ask the user to use their favorite text editor to edit the file in case they want to edit it, and not implement an editor in the agent interface. Parsing Layer
Yup, we should use it as it is already tried and tested. |
…aml for frontmatter parsing and add more tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saweeeet! Thanks for addressing the fixes and detailing the decision points.
LGTM! The only thing flagged was inferDomainAndComponents() and enrichIssueTemplate() for being dead code and not used. Revised assessment with context of our discussion around domain inference, resulted in:
Revised Assessment
The code is not dead code in the problematic sense - it's staged functionality waiting for:
50+ manually-mapped tickets to accumulate
Sufficient historical context to validate inference accuracy
A future change to make domain optional again
The implementation correctly:
Made domain mandatory now (per discussion)
Kept the inference logic ready for when it's needed
Followed the agreed phased approach
I generated this local CLAUDE.md and used Anthropic's official /code-review:code-review plugin.
### Project Overview
CHT-Agent is a hierarchical multi-agent system for CHT development workflows. This PR implements the **Research Supervisor POC** - an AI orchestrator that:
- Parses issue tickets (YAML frontmatter + Markdown)
- Searches documentation via mocked MCP integration
- Analyzes historical context for similar issues
- Generates implementation plans
### Technology Stack
- **TypeScript 5.3** with Node.js 20+
- **LangChain/LangGraph** (v0.3.0) for agent orchestration
- **Mocha/Chai/Sinon** for testing
### Key Architecture Components
| Component | Location | Purpose |
|-----------|----------|---------|
| Research Supervisor | `src/supervisors/research-supervisor.ts` | LangGraph state machine orchestrating agents |
| Documentation Agent | `src/agents/documentation-search-agent.ts` | Mocked MCP integration for doc search |
| Context Agent | `src/agents/context-analysis-agent.ts` | Historical context similarity scoring |
| Ticket Parser | `src/utils/ticket-parser.ts` | YAML frontmatter + Markdown parsing |
| Domain Inference | `src/utils/domain-inference.ts` | LLM-based domain classification |
### Review Focus Areas
1. **Type Safety**: `domain` is now mandatory in tickets - verify validation
2. **LLM Response Parsing**: Check regex extraction robustness in `domain-inference.ts`
3. **State Management**: LangGraph state reducers for concurrent updates
4. **Error Handling**: All LLM errors should accumulate in state, not throw
5. **Test Coverage**: ~34% - focus on untested LLM integration paths
### Known Testing Gaps
- `domain-inference.ts` - LLM mocking challenges (ESM-only `@langchain/anthropic`)
- LangGraph graph execution - needs integration tests
- CLI entry points - no coverage yet
### CHTDomains (7 functional areas)
`authentication`, `contacts`, `forms-and-reports`, `tasks-and-targets`, `messaging`, `data-sync`, `configuration`
### PR to review
- https://github.com/medic/cht-agent/pull/7
### Code Style and Guidelines
- Similar to CHT Style and Guidelines
- https://docs.communityhealthtoolkit.org/community/contributing/code/style-guide/
- https://docs.communityhealthtoolkit.org/community/contributing/code/quality-assistance/
Description
Creates research supervisor and it's reports the documentation agent(output mocked for now) and the context agent. Tested this for a dummy ticket
tickets/simple-example.mdand for a realcht-coretickettickets/10139.md. The output of the research might seem a bit basic at the moment due to lack of prior context and solved tickets.#6
Code review checklist
License
The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.