GitHub - cluesurf/knit: Book Weaver/Maker/Researcher/Writer

@cluesurf/knit

Book Weaver/Maker/Researcher/Writer
(WIP)

Introduction

Imagine having a team of researchers digging through historical archives about the Silk Road while editors polish your introduction and fact-checkers verify every claim, all at the same time. Whether you're writing about medieval banking or quantum computing, this system acts as your full writing staff, transforming what would take a solo author months into a matter of days.

This is an AI-powered book writing system using multiple Claude agents working in parallel via the Anthropic API.

Entrypoint file is in code/orchestrate.ts, so can start there to see how everything works.

Can see the basic agent prompts in work/agents/.

Code quickstart docs toward bottom of this readme.

Very much first early stages of prototype, not close to being robust yet.

Architecture

The orchestrator uses:

Parallel Processing: Multiple agents work simultaneously
Task Queue: Priority-based task distribution
Message Bus: Agent communication system
Rate Limiting: Automatic API throttling
Token Tracking: Monitors usage to stay within limits
Smart Routing: Tasks saved to appropriate folders by type
State Persistence: Full recovery from interruptions
Research Index: Lightweight tracking of completed research without keeping everything in memory
Dynamic Orchestration: Intelligent workflow that adapts based on current progress

The Orchestration Workflow

The system follows a sophisticated cyclic workflow that ensures comprehensive coverage while avoiding duplicate research:

Phase 1: Initial Outline

Creates a high-level outline based on the book concept
This serves as the foundation for targeted research

Phase 2: Outline-Driven Research

Parses the outline to identify topics needing research
Checks the research index to avoid duplicates (uses research/index.json)
Generates research tasks from outline sections
Each research is tracked with keywords for efficient duplicate detection

Phase 3: Research Curation

Every 30 research documents, curator organizes files
Groups research into thematic subfolders (1 level deep)
Deduplicates and consolidates related content
Creates clean folder structure like ancient-mesopotamia/, medieval-banking/
Maintains research index for navigation

Phase 4: Research Summarization

Every 10 research documents, creates batch summaries
Extracts key findings and identifies patterns
Tracks coverage and notes gaps for future research
Provides high-level overview for review

Phase 5: Narrative Weaving

After sufficient research (10+ topics), begins weaving narratives
Groups research by themes (Ancient Origins, Medieval Revolution, etc.)
Creates narrative threads connecting research pieces (max 5 per task)
Identifies gaps in research and notifies the system

Phase 6: Outline Refinement

Periodically refines the outline based on discoveries
Happens after significant research and weaving progress
Version-tracked outlines adapt to new insights
Triggers new research for newly identified topics

Phase 7: Content Writing

Begins when structure is solid (2+ outlines, 10+ weaves)
Writers use refined outlines and woven narratives
Creates actual book chapters and sections

Research Tracking System

To avoid keeping everything in memory, the system uses a file-based approach with lightweight indexing:

Research Index (`research/index.json`)

Maintains a lightweight index of all researched topics
Stores only: topic title, keywords, file path, timestamp
Enables fast duplicate checking without loading full content
Updated whenever research completes

Research Todos (`research/todos.json`)

Dynamic list of pending, in-progress, and completed research
Automatically updated as tasks progress
Includes gaps identified by the weaver
Prioritizes weaver-identified gaps as high priority

Research Curation

The Curator agent periodically (every 30 files) organizes research:

Subfolder Structure: Creates themed folders like ancient-mesopotamia/, silicon-valley-equity/
Deduplication: Merges duplicate research on same topics
Consolidation: Combines related short pieces into comprehensive documents
Clean Naming: Uses kebab-case, descriptive folder names (1-4 words)
Index Maintenance: Updates research/INDEX.md with folder structure

Duplicate Detection Strategy

Keyword Matching: Extracts keywords from topics and checks overlap
Fuzzy Matching: Checks if topics contain similar substrings
Index-First: Always checks the lightweight index before memory
Progressive Loading: Only loads full content when deep analysis needed

How the Cyclic Process Works

┌─────────────┐
│   OUTLINE   │──────┐
└─────────────┘      │
       ↑             ↓
       │      ┌──────────────┐
       │      │  RESEARCHER  │
       │      └──────────────┘
       │             │
       │             ↓
       │      ┌──────────────┐
       │      │   CURATOR    │ → (organizes files)
       │      └──────────────┘
       │             │
       │             ↓
       │      ┌──────────────┐     research/todos.json
       │      │    WEAVER    │ ←→  (gaps & todos)
       │      └──────────────┘
       │             │
       └─────────────┘
         (refinement)

Outline drives research: The outliner creates structure, researcher fills gaps
Research enables weaving: Completed research is woven into narratives
Weaver identifies gaps: Missing information is added to research todos
Outline evolves: Periodically refined based on discoveries
Cycle continues: Each refinement may trigger new research needs

Memory Efficiency

The system is designed to handle large-scale book projects without memory issues:

Index-based tracking: Only topic metadata kept in memory
File-based storage: All content persisted to disk immediately
Lazy loading: Content loaded only when needed
Automatic cleanup: Old in-memory data cleared periodically
Scalable to thousands of topics: Can handle extensive research projects

Architecture Documentation

Detailed documentation of all features and architectural decisions is maintained in the work/features/ folder:

agent-system.md: Multi-agent architecture and communication
orchestration-workflow.md: Seven-phase cyclic workflow
research-tracking.md: Duplicate detection and indexing system
memory-efficiency.md: Memory management strategies
rate-limiting.md: API rate limiting and optimization
task-management.md: Task file cleanup and archival
content-organization.md: Folder structure and naming conventions

These documents capture all customizations and implementation details for the book orchestration system. Any new prompts or behavior modifications should be documented there for reference.

Quick Start

# Set your Anthropic API key
export ANTHROPIC_API_KEY=your-key-here

# Install dependencies
pnpm install

# Run with default settings (4 agents per type, 1 hour)
pnpm start

# Quick test (1 agent, 36 seconds)
pnpm run quick-test

# Test run (2 agents, 6 minutes)
pnpm run test-run

# Production run with 10 agents
pnpm run agents:10

Command Line Options

tsx orchestrate.ts [options]

--agents, -a - Number of agents per type (default: 4)
--duration, -d - Duration in hours (default: 1, max: 72)
--retries, -r - Number of retries on error (default: 3)
--api-key, -k - Anthropic API key (or set ANTHROPIC_API_KEY env var)
--max-requests - Max requests per minute (default: 40)
--max-input-tokens - Max input tokens per minute (default: 30000)
--max-output-tokens - Max output tokens per minute (default: 8000)
--model - Claude model to use (default: claude-3-haiku-20240307)
--help - Show help

Interactive Commands

While running, you can type:

task <prompt> - Add a new task to the queue
status - Show orchestrator status and rate limits
agents - Show agent status by type
tasks - Show current task queue
summary - Generate summary of completed work
pause - Pause task processing
resume - Resume task processing
help - Show available commands and folder structure
quit/exit - Save state and shutdown gracefully

Output Folder Structure

The orchestrator automatically organizes outputs into appropriate folders:

/work
├── tasks/          # All task records and tracking
├── searches/       # Research agent outputs
├── outlines/       # Outliner agent outputs
├── drafts/         # Writer agent outputs
├── weaves/         # Weaver agent outputs (narrative)
├── finals/         # Editor & Integrator final outputs
├── citations/      # Indexer agent citations and references
├── verifications/  # Verifier agent fact-checks
├── summaries/      # Generated summaries
└── .orchestrator-state.json  # Saved state for resuming

Agent Types and Distribution

With --agents N, the system creates:

Orchestrator (1) - Coordinates all other agents
Outliner (N) - Creates and maintains book structure
Researcher (2N) - Gathers facts and sources
Writer (N) - Creates initial drafts
Weaver (N) - Transforms facts into narrative
Editor (N/2) - Polishes and refines content
Verifier (N/2) - Fact-checks all claims
Indexer (1) - Manages citations and references
Integrator (1) - Merges all work together
Summarizer (N/2) - Creates high-level summaries of research batches
Curator (1) - Organizes research into subfolders and deduplicates

Features

Rate Limiting

Respects Anthropic API rate limits automatically
Default: 40 requests/min, 30k input tokens/min, 8k output tokens/min
All limits are configurable via command-line flags
Visual feedback when waiting for rate limits

Smart File Organization

Tasks automatically saved to appropriate folders based on agent type
Descriptive filenames with dates and task summaries
Full task tracking in /tasks folder
Content-specific outputs in specialized folders

Agent Memory

Each agent maintains memory of recent tasks (last 20 items)
Context passed between related tasks
Memory persists across saves/loads

State Persistence & Resume

Automatic State Saving: Every 60 seconds, the orchestrator saves its complete state
Crash Recovery: If the process stops unexpectedly, it resumes exactly where it left off
Task Queue Preservation: All pending, active, and completed tasks are preserved
No Duplicate Tasks: On restart, checks existing task queue before adding new tasks
Active Task Reset: Tasks marked as "active" are reset to "pending" on restart (since they weren't completed)
Fresh Start Option: Use --reset flag to clear state and start fresh

How Resume Works

Normal Shutdown: State is saved to .orchestrator-state.json
On Restart:
- Loads previous task queue and agent states
- Resets any "active" tasks to "pending" (they weren't completed)
- Continues processing from where it left off
- Does NOT add duplicate initial tasks
To Start Fresh: Run with --reset flag to clear all state

Example:

# First run - creates initial tasks
pnpm start

# Ctrl+C to stop...

# Resume where you left off (continues with existing queue)
pnpm start

# Or start completely fresh (clears everything)
pnpm start --reset

Comprehensive Logging

Real-time task creation and assignment logging
Progress updates for each agent
File save locations displayed
Rate limit notifications
Error tracking with retry attempts
Agent type labels on all tasks

Models and Costs

Default model is claude-3-haiku-20240307 for cost efficiency:

~$0.25 per million input tokens
~$1.25 per million output tokens

For higher quality, use Opus:

tsx orchestrate.ts --model claude-3-opus-20240229

~$15 per million input tokens
~$75 per million output tokens

Tips

Start with fewer agents for testing (--agents 1)
Use --model claude-3-haiku-20240307 for cost-effective development
Monitor the status command to track rate limit usage
Check output folders regularly for generated content
Use pause if you need to review outputs before continuing
The system will wait automatically when rate limits are reached

Troubleshooting

No API Key

Set the ANTHROPIC_API_KEY environment variable or use --api-key flag

Rate Limit Errors

The system automatically handles rate limits, but you can:

Reduce number of agents
Increase retry delays in code
Use --max-requests to set lower limits

Memory Issues

Reduce agent count with --agents
Shorten duration with --duration
Clear old task results periodically

Task Failures

Check orchestrator.log for detailed errors
Failed tasks are saved with error details
Use --retries to increase retry attempts

License

MIT

ClueSurf

Made by ClueSurf, meditating on the universe ¤. Follow the work on YouTube, X, Instagram, Substack, Facebook, and LinkedIn, and browse more of our open-source work here on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
code		code
work		work
.gitattributes		.gitattributes
.gitignore		.gitignore
.prettierignore		.prettierignore
CLAUDE.md		CLAUDE.md
eslint.config.ts		eslint.config.ts
package.json		package.json
readme.md		readme.md
tsconfig.json		tsconfig.json

Uh oh!

cluesurf/knit

Folders and files

Latest commit

History

Repository files navigation

@cluesurf/knit

Introduction

Architecture

The Orchestration Workflow

Phase 1: Initial Outline

Phase 2: Outline-Driven Research

Phase 3: Research Curation

Phase 4: Research Summarization

Phase 5: Narrative Weaving

Phase 6: Outline Refinement

Phase 7: Content Writing

Research Tracking System

Research Index (research/index.json)

Research Todos (research/todos.json)

Research Curation

Duplicate Detection Strategy

How the Cyclic Process Works

Memory Efficiency

Architecture Documentation

Quick Start

Command Line Options

Interactive Commands

Output Folder Structure

Agent Types and Distribution

Features

Rate Limiting

Smart File Organization

Agent Memory

State Persistence & Resume

How Resume Works

Comprehensive Logging

Models and Costs

Tips

Troubleshooting

No API Key

Rate Limit Errors

Memory Issues

Task Failures

License

ClueSurf

About

Resources

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Research Index (`research/index.json`)

Research Todos (`research/todos.json`)

Packages