Skip to content

cluesurf/knit








@cluesurf/knit

Book Weaver/Maker/Researcher/Writer
(WIP)




Introduction

Imagine having a team of researchers digging through historical archives about the Silk Road while editors polish your introduction and fact-checkers verify every claim, all at the same time. Whether you're writing about medieval banking or quantum computing, this system acts as your full writing staff, transforming what would take a solo author months into a matter of days.

This is an AI-powered book writing system using multiple Claude agents working in parallel via the Anthropic API.

Entrypoint file is in code/orchestrate.ts, so can start there to see how everything works.

Can see the basic agent prompts in work/agents/.

Code quickstart docs toward bottom of this readme.

Very much first early stages of prototype, not close to being robust yet.

Architecture

The orchestrator uses:

  • Parallel Processing: Multiple agents work simultaneously
  • Task Queue: Priority-based task distribution
  • Message Bus: Agent communication system
  • Rate Limiting: Automatic API throttling
  • Token Tracking: Monitors usage to stay within limits
  • Smart Routing: Tasks saved to appropriate folders by type
  • State Persistence: Full recovery from interruptions
  • Research Index: Lightweight tracking of completed research without keeping everything in memory
  • Dynamic Orchestration: Intelligent workflow that adapts based on current progress

The Orchestration Workflow

The system follows a sophisticated cyclic workflow that ensures comprehensive coverage while avoiding duplicate research:

Phase 1: Initial Outline

  • Creates a high-level outline based on the book concept
  • This serves as the foundation for targeted research

Phase 2: Outline-Driven Research

  • Parses the outline to identify topics needing research
  • Checks the research index to avoid duplicates (uses research/index.json)
  • Generates research tasks from outline sections
  • Each research is tracked with keywords for efficient duplicate detection

Phase 3: Research Curation

  • Every 30 research documents, curator organizes files
  • Groups research into thematic subfolders (1 level deep)
  • Deduplicates and consolidates related content
  • Creates clean folder structure like ancient-mesopotamia/, medieval-banking/
  • Maintains research index for navigation

Phase 4: Research Summarization

  • Every 10 research documents, creates batch summaries
  • Extracts key findings and identifies patterns
  • Tracks coverage and notes gaps for future research
  • Provides high-level overview for review

Phase 5: Narrative Weaving

  • After sufficient research (10+ topics), begins weaving narratives
  • Groups research by themes (Ancient Origins, Medieval Revolution, etc.)
  • Creates narrative threads connecting research pieces (max 5 per task)
  • Identifies gaps in research and notifies the system

Phase 6: Outline Refinement

  • Periodically refines the outline based on discoveries
  • Happens after significant research and weaving progress
  • Version-tracked outlines adapt to new insights
  • Triggers new research for newly identified topics

Phase 7: Content Writing

  • Begins when structure is solid (2+ outlines, 10+ weaves)
  • Writers use refined outlines and woven narratives
  • Creates actual book chapters and sections

Research Tracking System

To avoid keeping everything in memory, the system uses a file-based approach with lightweight indexing:

Research Index (research/index.json)

  • Maintains a lightweight index of all researched topics
  • Stores only: topic title, keywords, file path, timestamp
  • Enables fast duplicate checking without loading full content
  • Updated whenever research completes

Research Todos (research/todos.json)

  • Dynamic list of pending, in-progress, and completed research
  • Automatically updated as tasks progress
  • Includes gaps identified by the weaver
  • Prioritizes weaver-identified gaps as high priority

Research Curation

The Curator agent periodically (every 30 files) organizes research:

  • Subfolder Structure: Creates themed folders like ancient-mesopotamia/, silicon-valley-equity/
  • Deduplication: Merges duplicate research on same topics
  • Consolidation: Combines related short pieces into comprehensive documents
  • Clean Naming: Uses kebab-case, descriptive folder names (1-4 words)
  • Index Maintenance: Updates research/INDEX.md with folder structure

Duplicate Detection Strategy

  1. Keyword Matching: Extracts keywords from topics and checks overlap
  2. Fuzzy Matching: Checks if topics contain similar substrings
  3. Index-First: Always checks the lightweight index before memory
  4. Progressive Loading: Only loads full content when deep analysis needed

How the Cyclic Process Works

┌─────────────┐
│   OUTLINE   │──────┐
└─────────────┘      │
       ↑             ↓
       │      ┌──────────────┐
       │      │  RESEARCHER  │
       │      └──────────────┘
       │             │
       │             ↓
       │      ┌──────────────┐
       │      │   CURATOR    │ → (organizes files)
       │      └──────────────┘
       │             │
       │             ↓
       │      ┌──────────────┐     research/todos.json
       │      │    WEAVER    │ ←→  (gaps & todos)
       │      └──────────────┘
       │             │
       └─────────────┘
         (refinement)
  1. Outline drives research: The outliner creates structure, researcher fills gaps
  2. Research enables weaving: Completed research is woven into narratives
  3. Weaver identifies gaps: Missing information is added to research todos
  4. Outline evolves: Periodically refined based on discoveries
  5. Cycle continues: Each refinement may trigger new research needs

Memory Efficiency

The system is designed to handle large-scale book projects without memory issues:

  • Index-based tracking: Only topic metadata kept in memory
  • File-based storage: All content persisted to disk immediately
  • Lazy loading: Content loaded only when needed
  • Automatic cleanup: Old in-memory data cleared periodically
  • Scalable to thousands of topics: Can handle extensive research projects

Architecture Documentation

Detailed documentation of all features and architectural decisions is maintained in the work/features/ folder:

  • agent-system.md: Multi-agent architecture and communication
  • orchestration-workflow.md: Seven-phase cyclic workflow
  • research-tracking.md: Duplicate detection and indexing system
  • memory-efficiency.md: Memory management strategies
  • rate-limiting.md: API rate limiting and optimization
  • task-management.md: Task file cleanup and archival
  • content-organization.md: Folder structure and naming conventions

These documents capture all customizations and implementation details for the book orchestration system. Any new prompts or behavior modifications should be documented there for reference.

Quick Start

# Set your Anthropic API key
export ANTHROPIC_API_KEY=your-key-here

# Install dependencies
pnpm install

# Run with default settings (4 agents per type, 1 hour)
pnpm start

# Quick test (1 agent, 36 seconds)
pnpm run quick-test

# Test run (2 agents, 6 minutes)
pnpm run test-run

# Production run with 10 agents
pnpm run agents:10

Command Line Options

tsx orchestrate.ts [options]
  • --agents, -a - Number of agents per type (default: 4)
  • --duration, -d - Duration in hours (default: 1, max: 72)
  • --retries, -r - Number of retries on error (default: 3)
  • --api-key, -k - Anthropic API key (or set ANTHROPIC_API_KEY env var)
  • --max-requests - Max requests per minute (default: 40)
  • --max-input-tokens - Max input tokens per minute (default: 30000)
  • --max-output-tokens - Max output tokens per minute (default: 8000)
  • --model - Claude model to use (default: claude-3-haiku-20240307)
  • --help - Show help

Interactive Commands

While running, you can type:

  • task <prompt> - Add a new task to the queue
  • status - Show orchestrator status and rate limits
  • agents - Show agent status by type
  • tasks - Show current task queue
  • summary - Generate summary of completed work
  • pause - Pause task processing
  • resume - Resume task processing
  • help - Show available commands and folder structure
  • quit/exit - Save state and shutdown gracefully

Output Folder Structure

The orchestrator automatically organizes outputs into appropriate folders:

/work
├── tasks/          # All task records and tracking
├── searches/       # Research agent outputs
├── outlines/       # Outliner agent outputs
├── drafts/         # Writer agent outputs
├── weaves/         # Weaver agent outputs (narrative)
├── finals/         # Editor & Integrator final outputs
├── citations/      # Indexer agent citations and references
├── verifications/  # Verifier agent fact-checks
├── summaries/      # Generated summaries
└── .orchestrator-state.json  # Saved state for resuming

Agent Types and Distribution

With --agents N, the system creates:

  • Orchestrator (1) - Coordinates all other agents
  • Outliner (N) - Creates and maintains book structure
  • Researcher (2N) - Gathers facts and sources
  • Writer (N) - Creates initial drafts
  • Weaver (N) - Transforms facts into narrative
  • Editor (N/2) - Polishes and refines content
  • Verifier (N/2) - Fact-checks all claims
  • Indexer (1) - Manages citations and references
  • Integrator (1) - Merges all work together
  • Summarizer (N/2) - Creates high-level summaries of research batches
  • Curator (1) - Organizes research into subfolders and deduplicates

Features

Rate Limiting

  • Respects Anthropic API rate limits automatically
  • Default: 40 requests/min, 30k input tokens/min, 8k output tokens/min
  • All limits are configurable via command-line flags
  • Visual feedback when waiting for rate limits

Smart File Organization

  • Tasks automatically saved to appropriate folders based on agent type
  • Descriptive filenames with dates and task summaries
  • Full task tracking in /tasks folder
  • Content-specific outputs in specialized folders

Agent Memory

  • Each agent maintains memory of recent tasks (last 20 items)
  • Context passed between related tasks
  • Memory persists across saves/loads

State Persistence & Resume

  • Automatic State Saving: Every 60 seconds, the orchestrator saves its complete state
  • Crash Recovery: If the process stops unexpectedly, it resumes exactly where it left off
  • Task Queue Preservation: All pending, active, and completed tasks are preserved
  • No Duplicate Tasks: On restart, checks existing task queue before adding new tasks
  • Active Task Reset: Tasks marked as "active" are reset to "pending" on restart (since they weren't completed)
  • Fresh Start Option: Use --reset flag to clear state and start fresh

How Resume Works

  1. Normal Shutdown: State is saved to .orchestrator-state.json
  2. On Restart:
    • Loads previous task queue and agent states
    • Resets any "active" tasks to "pending" (they weren't completed)
    • Continues processing from where it left off
    • Does NOT add duplicate initial tasks
  3. To Start Fresh: Run with --reset flag to clear all state

Example:

# First run - creates initial tasks
pnpm start

# Ctrl+C to stop...

# Resume where you left off (continues with existing queue)
pnpm start

# Or start completely fresh (clears everything)
pnpm start --reset

Comprehensive Logging

  • Real-time task creation and assignment logging
  • Progress updates for each agent
  • File save locations displayed
  • Rate limit notifications
  • Error tracking with retry attempts
  • Agent type labels on all tasks

Models and Costs

Default model is claude-3-haiku-20240307 for cost efficiency:

  • ~$0.25 per million input tokens
  • ~$1.25 per million output tokens

For higher quality, use Opus:

tsx orchestrate.ts --model claude-3-opus-20240229
  • ~$15 per million input tokens
  • ~$75 per million output tokens

Tips

  • Start with fewer agents for testing (--agents 1)
  • Use --model claude-3-haiku-20240307 for cost-effective development
  • Monitor the status command to track rate limit usage
  • Check output folders regularly for generated content
  • Use pause if you need to review outputs before continuing
  • The system will wait automatically when rate limits are reached

Troubleshooting

No API Key

Set the ANTHROPIC_API_KEY environment variable or use --api-key flag

Rate Limit Errors

The system automatically handles rate limits, but you can:

  • Reduce number of agents
  • Increase retry delays in code
  • Use --max-requests to set lower limits

Memory Issues

  • Reduce agent count with --agents
  • Shorten duration with --duration
  • Clear old task results periodically

Task Failures

  • Check orchestrator.log for detailed errors
  • Failed tasks are saved with error details
  • Use --retries to increase retry attempts

License

MIT

ClueSurf

Made by ClueSurf, meditating on the universe ¤. Follow the work on YouTube, X, Instagram, Substack, Facebook, and LinkedIn, and browse more of our open-source work here on GitHub.

About

Book Weaver/Maker/Researcher/Writer

Resources

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published