Skip to content

Conversation

@sugat009
Copy link
Member

@sugat009 sugat009 commented Nov 19, 2025

Description

Creates research supervisor and it's reports the documentation agent(output mocked for now) and the context agent. Tested this for a dummy ticket tickets/simple-example.md and for a real cht-core ticket tickets/10139.md. The output of the research might seem a bit basic at the moment due to lack of prior context and solved tickets.

#6

Code review checklist

  • Readable: Concise, well named, follows the style guide, documented if necessary.
  • Documented: Configuration and user documentation on cht-docs
  • Tested: Unit and/or e2e where appropriate
  • Backwards compatible: Works with existing data and configuration or includes a migration. Any breaking changes documented in the release notes.

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

@sugat009 sugat009 requested a review from Hareet November 19, 2025 15:09
@sugat009 sugat009 self-assigned this Nov 19, 2025
@sugat009 sugat009 linked an issue Nov 19, 2025 that may be closed by this pull request
@Hareet
Copy link
Member

Hareet commented Nov 20, 2025

Yaay!

Being in the new-world, I'm going to try a new-world approach! This should add learning to new Claude functionality, and speed up our understanding and implementation.

Let's try a new PR review workflow, and we can review if it's helpful and speedy.

  • In regular PR conversation comments, I'll ask some design questions/suggestions.
  • In-line review-comments will be Claude-assisted and verified by me.

I've set up two Claude Skills:

I'm finding Code Review Excellence from https://github.com/wshobson/agents to be most helpful so far.

Here we gooo

Copy link
Member

@Hareet Hareet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are moving along!! 🚂 🚂

Some minor feedback with in-line comments, and design questions here:

Design questions:

  • How are we currently feeling about LangGraph? Is it overkill for 4-5 agents with 2-3 supervisors? Should we wrap Agent SDK calls directly and not worry about node state. Do you feel comfortable enough on this path?

agents/documentation-search-agent.ts

I'm imagining we'll need a few llm calls here. Potential workflow: our code calls MCP, retrieves list of sources or results to ask_question. Our code makes an LLM call along with agent-memory, and then we tell it to concise it down.

llm models:

We want to be able to use anything. Is that a plan to refactor later, or should we pull it out into a module now?

LLM API calls vs Agent SDKs:

API calls are a bit lower level and might be missing a ton of the system prompt that are injected with Claude Code that make it a viable coding assistant. How do we imagine handling that without getting locked into a vendor llm, and without re-inventing a functional assistant while ensuring our assistant isn't worse performing than provided agent.

TODO and Planning features in Claude Code:

Do we know if LangGraph handles this internally? Those are some of the core features that make Claude Code multi agents work better. Sharing todo, planning, and verification lists. Does that occur with raw api calls? I see it at the end of the Supervisor phase, but should it be added for some of the other api calls?

Domain Inference:

This seems highly important and I'm worried about the difficulty of getting this correct, or this value guiding the agents down a wild path. What's your opinion on leaving this out of scope for now until we manually have mapped 50+ tickets? And then we can test to see how accurate our domain inference would be, or do you see that happening alongside immediately at release?

Human-in-the-loop:

From the research-supervisor.ts, we should present the plan and have us read over it, modify the plan file and then agree to continue.

flowchart TD
    subgraph Input["INPUT LAYER"]
        TICKET_FILE["Ticket.md<br/>─────────────<br/>---<br/>title: Feature X<br/>type: feature<br/>priority: high<br/>domain: contacts<br/>---<br/>## Description<br/>..."]
    end

    subgraph Parsing["PARSING LAYER"]
        TICKET_FILE --> PARSER["parseTicketFile()"]

        PARSER --> READ["fs.readFileSync(path)"]
        READ --> EXTRACT_FM["extractFrontmatter(content)"]

        EXTRACT_FM --> YAML_PARSE["parseSimpleYAML()"]
        YAML_PARSE --> META["metadata: {<br/>  title, type,<br/>  priority, domain<br/>}"]

        EXTRACT_FM --> MD_BODY["markdown body"]

        MD_BODY --> EXTRACT_DESC["extractSection('Description')"]
        MD_BODY --> EXTRACT_REQ["extractBulletList('Requirements')"]
        MD_BODY --> EXTRACT_AC["extractBulletList('Acceptance Criteria')"]
        MD_BODY --> EXTRACT_COMP["extractCodeItems('Technical Context')"]
        MD_BODY --> EXTRACT_REF["extractURLs('References')"]
    end

    subgraph IssueTemplate["IssueTemplate STRUCTURE"]
        META & EXTRACT_DESC & EXTRACT_REQ & EXTRACT_AC & EXTRACT_COMP & EXTRACT_REF --> ISSUE_OBJ["IssueTemplate {<br/>  issue: {<br/>    title: string<br/>    type: 'feature' | 'bug'<br/>    priority: 'high' | 'medium' | 'low'<br/>    description: string<br/>    requirements: string[]<br/>    acceptance_criteria: string[]<br/>    technical_context: {<br/>      domain: CHTDomain<br/>      components: string[]<br/>    }<br/>  }<br/>}"]
    end

    subgraph Enrichment["ENRICHMENT LAYER"]
        ISSUE_OBJ --> DOMAIN_CHECK{"domain<br/>specified?"}
        DOMAIN_CHECK -->|Yes| ENRICHED["Enriched IssueTemplate"]
        DOMAIN_CHECK -->|No| INFER["inferDomainAndComponents()"]

        INFER --> LLM_PROMPT["Build LLM Prompt<br/>with issue description"]
        LLM_PROMPT --> CLAUDE["Claude API<br/>temperature=0.2"]
        CLAUDE --> JSON_EXTRACT["Extract JSON from response"]
        JSON_EXTRACT --> MERGE["Merge inferred<br/>domain + components"]
        MERGE --> ENRICHED
    end

    subgraph StateInit["STATE INITIALIZATION"]
        ENRICHED --> INIT_STATE["ResearchState {<br/>  messages: []<br/>  issue: IssueTemplate<br/>  researchFindings: null<br/>  contextAnalysis: null<br/>  orchestrationPlan: null<br/>  currentPhase: 'init'<br/>  errors: []<br/>}"]
    end

    style TICKET_FILE fill:#fff3e0
    style ISSUE_OBJ fill:#e3f2fd
    style ENRICHED fill:#e8f5e9
    style INIT_STATE fill:#f3e5f5
Loading

Our parsing layer using custom parses looks like it could get out of hand. Do you think we should go with js-yaml? ticket-parser.ts and context-loader.ts

@Hareet
Copy link
Member

Hareet commented Dec 4, 2025

Using comprehensive-review agents, I found this helpful to guide me. I wanted to place this here for others, and if you see any obvious corrections.

┌──────────────────────────────────────────────────────────────────────────────┐
│                              INPUT                                            │
├──────────────────────────────────────────────────────────────────────────────┤
│  Ticket File (Markdown)                                                       │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │  ---                                                                    │  │
│  │  title: Add offline support for contact sync                           │  │
│  │  type: feature                                                          │  │
│  │  priority: high                                                         │  │
│  │  domain: data-sync                                                      │  │
│  │  ---                                                                    │  │
│  │  ## Description                                                         │  │
│  │  Users need to sync contacts while offline...                          │  │
│  │  ## Requirements                                                        │  │
│  │  - Store contacts in IndexedDB                                         │  │
│  │  - Queue sync operations...                                            │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ parseTicketFile()
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                          IssueTemplate                                        │
├──────────────────────────────────────────────────────────────────────────────┤
│  {                                                                            │
│    issue: {                                                                   │
│      title: "Add offline support for contact sync",                          │
│      type: "feature",                                                         │
│      priority: "high",                                                        │
│      description: "Users need to sync contacts while offline...",            │
│      requirements: ["Store contacts in IndexedDB", "Queue sync..."],         │
│      acceptance_criteria: [...],                                              │
│      technical_context: {                                                     │
│        domain: "data-sync",                                                   │
│        components: [],                                                        │
│        existing_code_references: [],                                          │
│        api_endpoints: []                                                      │
│      },                                                                       │
│      references: [...]                                                        │
│    }                                                                          │
│  }                                                                            │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ enrichIssueTemplate() [if needed]
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                     Enriched IssueTemplate                                    │
├──────────────────────────────────────────────────────────────────────────────┤
│  {                                                                            │
│    issue: {                                                                   │
│      ...previous fields...,                                                   │
│      technical_context: {                                                     │
│        domain: "data-sync",           // ← Inferred by LLM if missing        │
│        components: ["PouchDB", "service-worker", "sync-service"],            │
│        ...                                                                    │
│      }                                                                        │
│    }                                                                          │
│  }                                                                            │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ Initialize ResearchState
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                         ResearchState (Initial)                               │
├──────────────────────────────────────────────────────────────────────────────┤
│  {                                                                            │
│    messages: [],                                                              │
│    issue: IssueTemplate,                                                      │
│    researchFindings: null,                                                    │
│    contextAnalysis: null,                                                     │
│    orchestrationPlan: null,                                                   │
│    currentPhase: "init",                                                      │
│    errors: []                                                                 │
│  }                                                                            │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┴───────────────┐
                    │                               │
                    ▼                               │
┌─────────────────────────────────────┐             │
│  Documentation Search Node          │             │
├─────────────────────────────────────┤             │
│                                     │             │
│  query = buildSearchQuery(issue)    │             │
│  ┌─────────────────────────────┐    │             │
│  │ "Add offline support for    │    │             │
│  │  contact sync data-sync"    │    │             │
│  └─────────────────────────────┘    │             │
│                │                    │             │
│                ▼                    │             │
│  Kapa.AI MCP (or Mock)              │             │
│                │                    │             │
│                ▼                    │             │
│  ┌─────────────────────────────┐    │             │
│  │ DocumentationReference[]    │    │             │
│  │ - title, url, relevance     │    │             │
│  │ - excerpt, source           │    │             │
│  └─────────────────────────────┘    │             │
│                │                    │             │
│                ▼                    │             │
│  ┌─────────────────────────────┐    │             │
│  │ ResearchFindings            │    │             │
│  │ - documentationReferences   │    │             │
│  │ - suggestedApproaches       │    │             │
│  │ - confidence: 0.85          │    │             │
│  │ - queryUsed                 │    │             │
│  └─────────────────────────────┘    │             │
│                                     │             │
└─────────────────────────────────────┘             │
                    │                               │
                    ▼                               │
┌──────────────────────────────────────────────────────────────────────────────┐
│                   ResearchState (After Doc Search)                            │
├──────────────────────────────────────────────────────────────────────────────┤
│  {                                                                            │
│    messages: [HumanMessage, AIMessage],                                       │
│    issue: IssueTemplate,                                                      │
│    researchFindings: ResearchFindings,        // ← UPDATED                   │
│    contextAnalysis: null,                                                     │
│    orchestrationPlan: null,                                                   │
│    currentPhase: "doc-search",                // ← UPDATED                   │
│    errors: []                                                                 │
│  }                                                                            │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────┐
│  Context Analysis Node              │
├─────────────────────────────────────┤
│                                     │
│  loadDomainOverview("data-sync")    │
│  ┌─────────────────────────────┐    │
│  │ DomainOverview              │    │
│  │ - description               │    │
│  │ - key_concepts              │    │
│  │ - related_domains           │    │
│  └─────────────────────────────┘    │
│                │                    │
│                ▼                    │
│  loadDomainComponents("data-sync")  │
│  ┌─────────────────────────────┐    │
│  │ DomainComponents            │    │
│  │ - api: controllers, svcs    │    │
│  │ - webapp: components        │    │
│  │ - shared_libs: utilities    │    │
│  └─────────────────────────────┘    │
│                │                    │
│                ▼                    │
│  findResolvedIssues("data-sync")    │
│  ┌─────────────────────────────┐    │
│  │ ResolvedIssueContext[]      │    │
│  │ - issue_id, title           │    │
│  │ - patterns_applied          │    │
│  │ - design_decisions          │    │
│  │ - lessons_learned           │    │
│  └─────────────────────────────┘    │
│                │                    │
│                ▼                    │
│  calculateSimilarityScore()         │
│  ┌─────────────────────────────┐    │
│  │ Weights:                    │    │
│  │ - Title match: 20%          │    │
│  │ - Type match: 15%           │    │
│  │ - Domain match: 25%         │    │
│  │ - Requirements: 25%         │    │
│  │ - Components: 15%           │    │
│  └─────────────────────────────┘    │
│                │                    │
│                ▼                    │
│  extractPatterns() + generateRecs() │
│  ┌─────────────────────────────┐    │
│  │ ContextAnalysisResult       │    │
│  │ - similarIssues[]           │    │
│  │ - codePatterns[]            │    │
│  │ - designDecisions[]         │    │
│  │ - recommendations[]         │    │
│  │ - successRate: 0.78         │    │
│  └─────────────────────────────┘    │
│                                     │
└─────────────────────────────────────┘
                    │
                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                  ResearchState (After Context Analysis)                       │
├──────────────────────────────────────────────────────────────────────────────┤
│  {                                                                            │
│    messages: [...],                                                           │
│    issue: IssueTemplate,                                                      │
│    researchFindings: ResearchFindings,                                        │
│    contextAnalysis: ContextAnalysisResult,    // ← UPDATED                   │
│    orchestrationPlan: null,                                                   │
│    currentPhase: "context-analysis",          // ← UPDATED                   │
│    errors: []                                                                 │
│  }                                                                            │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────┐
│  Generate Plan Node                 │
├─────────────────────────────────────┤
│                                     │
│  buildPlanPrompt()                  │
│  ┌─────────────────────────────┐    │
│  │ Prompt includes:            │    │
│  │ - Issue details             │    │
│  │ - Documentation findings    │    │
│  │ - Similar issues            │    │
│  │ - Code patterns             │    │
│  │ - Recommendations           │    │
│  │ - Request for JSON plan     │    │
│  └─────────────────────────────┘    │
│                │                    │
│                ▼                    │
│  Claude API (plannerModel)          │
│                │                    │
│                ▼                    │
│  parsePlanResponse()                │
│  + estimateComplexity()             │
│  + buildPhases()                    │
│  + identifyRiskFactors()            │
│  + estimateEffort()                 │
│  ┌─────────────────────────────┐    │
│  │ OrchestrationPlan           │    │
│  │ - summary                   │    │
│  │ - keyFindings[]             │    │
│  │ - proposedApproach          │    │
│  │ - estimatedComplexity       │    │
│  │ - phases[]                  │    │
│  │ - riskFactors[]             │    │
│  │ - estimatedEffort           │    │
│  └─────────────────────────────┘    │
│                                     │
└─────────────────────────────────────┘
                    │
                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                        ResearchState (Final)                                  │
├──────────────────────────────────────────────────────────────────────────────┤
│  {                                                                            │
│    messages: [HumanMessage, AIMessage, ...],                                  │
│    issue: IssueTemplate,                                                      │
│    researchFindings: ResearchFindings,                                        │
│    contextAnalysis: ContextAnalysisResult,                                    │
│    orchestrationPlan: OrchestrationPlan,      // ← FINAL OUTPUT              │
│    currentPhase: "complete",                  // ← WORKFLOW DONE             │
│    errors: []                                                                 │
│  }                                                                            │
└──────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    │ Return to CLI
                                    ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                              OUTPUT                                           │
├──────────────────────────────────────────────────────────────────────────────┤
│  Console formatted output:                                                    │
│  - Ticket Summary                                                             │
│  - Documentation Findings (references, approaches, confidence)                │
│  - Context Analysis (similar issues, patterns, recommendations)               │
│  - Orchestration Plan (summary, phases, risks, effort estimate)               │
└──────────────────────────────────────────────────────────────────────────────┘

@sugat009
Copy link
Member Author

sugat009 commented Dec 5, 2025

LangGraph

How are we currently feeling about LangGraph? Is it overkill for 4-5 agents with 2-3 supervisors? Should we wrap Agent SDK calls directly and not worry about node state. Do you feel comfortable enough on this path?

LangGraph is justified because of the feedback loops. Without cycles I'd agree it's overkill, but human rejection and QA failures need to route back. That's what graphs are for. I'm comfortable continuing.


MCP / Kapa.ai Workflow

I'm imagining we'll need a few llm calls here. Potential workflow: our code calls MCP, retrieves list of sources or results to ask_question. Our code makes an LLM call along with agent-memory, and then we tell it to concise it down.

Use ask_question. Kapa already did the heavy lifting with RAG + user context. We just need one LLM call to contextualize their answer with our agent-memory and structure the output.


LLM Models

We want to be able to use anything. Is that a plan to refactor later, or should we pull it out into a module now?

Agree we should do it now, not later. I've already built this as part of the dev supervisor work, an LLM abstraction with interface + factory pattern. Claude is fully implemented, OpenAI/Gemini are stubbed for when we need them. Will be in the next PR for review.


LLM API calls vs Agent SDKs

API calls are a bit lower level and might be missing a ton of the system prompt that are injected with Claude Code that make it a viable coding assistant. How do we imagine handling that without getting locked into a vendor llm, and without re-inventing a functional assistant while ensuring our assistant isn't worse performing than provided agent.

I think we just build it out piece by piece. The thing is, we're not trying to build a general purpose coding assistant. Our agents do narrow, CHT specific tasks. So we can write focused prompts that leverage context from agent-memory that generic assistants wouldn't have anyway. Plus we're mostly dealing with structured outputs, not open ended "write me a feature" stuff. And with human validation checkpoints, we can catch what doesn't work and iterate. It won't be perfect out of the gate, but we'll refine as we go.


TODO and Planning features

Do we know if LangGraph handles this internally? Those are some of the core features that make Claude Code multi agents work better. Sharing todo, planning, and verification lists. Does that occur with raw api calls? I see it at the end of the Supervisor phase, but should it be added for some of the other api calls?

LangGraph handles state sharing between agents. When one finishes, the next sees its output. But it doesn't do the todo/planning stuff that Claude Code does internally. Right now our supervisors produce plans, but individual agents just execute. For simpler agents like doc search, that's fine. For code generation, we might benefit from having the agent explicitly plan before generating. This could help with reasoning and make it easier to debug when things go wrong. Worth considering for the coding and test agents.


Domain Inference

This seems highly important and I'm worried about the difficulty of getting this correct, or this value guiding the agents down a wild path. What's your opinion on leaving this out of scope for now until we manually have mapped 50+ tickets? And then we can test to see how accurate our domain inference would be, or do you see that happening alongside immediately at release?

Yes, I think it would be a good idea to have the domain as a required field for a good few tickets until we solve a few of them. Once we have context built up we can then have it as optional.


Human-in-the-loop

From the research-supervisor.ts, we should present the plan and have us read over it, modify the plan file and then agree to continue.

Yeah, that is a good idea. However, we should just ask the user to use their favorite text editor to edit the file in case they want to edit it, and not implement an editor in the agent interface.


Parsing Layer

Our parsing layer using custom parsers looks like it could get out of hand. Do you think we should go with js-yaml? ticket-parser.ts and context-loader.ts

Yup, we should use it as it is already tried and tested.

…aml for frontmatter parsing and add more tests
@sugat009 sugat009 requested a review from Hareet December 12, 2025 16:59
@andrablaj andrablaj changed the title feat(#6): add research supervisor and it's reports documentation agent and context agent feat(#6): add research supervisor and its reports documentation agent and context agent Dec 18, 2025
Copy link
Member

@Hareet Hareet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saweeeet! Thanks for addressing the fixes and detailing the decision points.

LGTM! The only thing flagged was inferDomainAndComponents() and enrichIssueTemplate() for being dead code and not used. Revised assessment with context of our discussion around domain inference, resulted in:

Revised Assessment
The code is not dead code in the problematic sense - it's staged functionality waiting for:

50+ manually-mapped tickets to accumulate
Sufficient historical context to validate inference accuracy
A future change to make domain optional again
The implementation correctly:

Made domain mandatory now (per discussion)
Kept the inference logic ready for when it's needed
Followed the agreed phased approach

I generated this local CLAUDE.md and used Anthropic's official /code-review:code-review plugin.

### Project Overview

CHT-Agent is a hierarchical multi-agent system for CHT development workflows. This PR implements the **Research Supervisor POC** - an AI orchestrator that:
- Parses issue tickets (YAML frontmatter + Markdown)
- Searches documentation via mocked MCP integration
- Analyzes historical context for similar issues
- Generates implementation plans

### Technology Stack
- **TypeScript 5.3** with Node.js 20+
- **LangChain/LangGraph** (v0.3.0) for agent orchestration
- **Mocha/Chai/Sinon** for testing

### Key Architecture Components

| Component | Location | Purpose |
|-----------|----------|---------|
| Research Supervisor | `src/supervisors/research-supervisor.ts` | LangGraph state machine orchestrating agents |
| Documentation Agent | `src/agents/documentation-search-agent.ts` | Mocked MCP integration for doc search |
| Context Agent | `src/agents/context-analysis-agent.ts` | Historical context similarity scoring |
| Ticket Parser | `src/utils/ticket-parser.ts` | YAML frontmatter + Markdown parsing |
| Domain Inference | `src/utils/domain-inference.ts` | LLM-based domain classification |

### Review Focus Areas

1. **Type Safety**: `domain` is now mandatory in tickets - verify validation
2. **LLM Response Parsing**: Check regex extraction robustness in `domain-inference.ts`
3. **State Management**: LangGraph state reducers for concurrent updates
4. **Error Handling**: All LLM errors should accumulate in state, not throw
5. **Test Coverage**: ~34% - focus on untested LLM integration paths

### Known Testing Gaps
- `domain-inference.ts` - LLM mocking challenges (ESM-only `@langchain/anthropic`)
- LangGraph graph execution - needs integration tests
- CLI entry points - no coverage yet

### CHTDomains (7 functional areas)
`authentication`, `contacts`, `forms-and-reports`, `tasks-and-targets`, `messaging`, `data-sync`, `configuration` 

### PR to review
- https://github.com/medic/cht-agent/pull/7

### Code Style and Guidelines

- Similar to CHT Style and Guidelines
- https://docs.communityhealthtoolkit.org/community/contributing/code/style-guide/
- https://docs.communityhealthtoolkit.org/community/contributing/code/quality-assistance/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create Research Supervisor

3 participants