Skip to content

Data Store Interface for Summary Caching #11

@IBJunior

Description

@IBJunior

Problem Statement

What problem does this feature solve?

The current SummarizeCompressor implementation has a significant performance and cost issue. Every time compression is triggered, it re-summarizes the entire message history from scratch, even if portions of that history have been previously summarized.

Specific issues:

  • Performance: Lines 97-111 in src/strategies/summarize.ts always process the complete message range, leading to redundant AI model calls
  • Cost: Repeated summarization of the same content increases API costs unnecessarily
  • Inefficiency: The original goal of summarization (reduce hallucination and save costs) is undermined by this approach

Current behavior:

// Every compression call processes ALL messages in range
const messagesToSummarize = messages.slice(summarizeStart, keepTailStart);
const conversationText = messagesToSummarize
  .map((msg) => `${msg.role}: ${msg.content}`)
  .join('\n---\n');

Proposed Solution

High-level approach to solving the problem

Introduce a data store interface that allows caching of previously generated summaries, following the library's "Bring Your Own Model" (BYOM) pattern with "Bring Your Own Store" (BYOS).

Key components:

  1. SlimContextStore interface for storage abstraction
  2. Enhanced message identification system with thread/conversation IDs
  3. Intelligent cache key strategy using conversation context + message ranges
  4. Modified SummarizeCompressor to check cache before generating new summaries
  5. Optional InMemoryStore implementation for testing and learning purposes

Technical Details

Implementation considerations

New interfaces needed:

interface SlimContextStore {
  get(key: string): Promise<string | null>;
  set(key: string, value: string): Promise<void>;
  delete(key: string): Promise<void>;
}

// Conversation wrapper to avoid repetitive threadId on each message
interface SlimContextConversation {
  threadId: string;
  messages: SlimContextMessage[];
  metadata?: Record<string, unknown>;
}

// Keep SlimContextMessage clean and focused (no repetitive threadId)
interface SlimContextMessage {
  role: 'system' | 'user' | 'assistant' | 'tool' | 'human';
  content: string;
  metadata?: Record<string, unknown>;
  id?: string;        // Optional message identifier
  index?: number;     // Position within conversation
}

interface CacheKey {
  threadId: string;
  type: 'summary' | 'message';
  startIndex: number;
  endIndex?: number;     // For range summaries
}

Cache key strategy:

  • Format: "thread_{threadId}:summary:{startIndex}-{endIndex}"
  • Example: "thread_123:summary:5-15" (summary of messages 5-15 in thread 123)
  • Avoids using message content as keys (inefficient for long messages)

Backward Compatibility Strategy:

// Enhanced compressor interface with method overloading
interface SlimContextCompressor {
  // Existing method - maintains full backward compatibility
  compress(messages: SlimContextMessage[]): Promise<SlimContextMessage[]>;

  // New method - accepts conversation wrapper for enhanced functionality
  compress(conversation: SlimContextConversation): Promise<SlimContextConversation>;
}

// Utility functions for format conversion
function wrapMessages(messages: SlimContextMessage[], threadId: string): SlimContextConversation;
function unwrapMessages(conversation: SlimContextConversation): SlimContextMessage[];

Integration points:

  • Modify SummarizeCompressor.compress() to check cache before summarizing
  • Add method overloading to support both message arrays and conversation wrappers
  • Add store configuration to SummarizeConfig
  • Update token estimation to account for cached summaries
  • Cache key generation uses conversation.threadId instead of per-message repetition

Implementation Considerations

Open questions and design decisions

1. Summary Combination Strategy
When extending a cached summary (e.g., have summary 5-15, need summary 5-25):

Option A: AI-Driven Combination

  • Send model: existing summary (5-15) + new messages (16-25) combined summary (5-25)
  • Pros: Intelligent merging, better context preservation, can resolve contradictions
  • Cons: More expensive, potentially slower, risk of AI hallucination

Option B: Client-Side Concatenation

  • Send model: only new messages (16-25) new summary (16-25)
  • Concatenate: summary(5-15) + summary(16-25) = combined(5-25)
  • Pros: Cost-effective, faster, predictable behavior
  • Cons: Potential fragmentation, no cross-segment awareness

Option C: Hybrid Configurable

  • Allow users to choose strategy based on cost/quality tradeoffs
  • Default to client-side with option for AI-driven

2. Cache Invalidation Strategy

  • Should cache entries expire?
  • How to handle message updates/edits?
  • Thread-based vs global cache management

3. Store Interface Scope

  • Keep minimal (get/set/delete) or add advanced features (batch operations, TTL)?
  • Async vs sync interface design?

4. Thread ID Management

  • Who provides the thread ID? User application or library?
  • Default behavior when no thread ID provided?
  • Should we auto-generate thread IDs for backward compatibility?

5. Conversation Wrapper Benefits

  • Eliminates repetition: No threadId duplication across messages
  • Cleaner API: Separates conversation context from individual message data
  • Better performance: Reduces memory usage and serialization overhead
  • Extensible: Easy to add conversation-level metadata without touching messages

Acceptance Criteria

Definition of done

  • SlimContextStore interface defined in src/interfaces.ts
  • SlimContextConversation wrapper interface implemented
  • Method overloading for compressors (both message array and conversation wrapper)
  • Utility functions for format conversion (wrapMessages/unwrapMessages)
  • Modified SummarizeCompressor to use store for caching
  • Cache key generation utilities using conversation context
  • InMemoryStore reference implementation
  • Summary combination strategy implemented (choose one approach initially)
  • Unit tests for caching behavior and backward compatibility
  • Performance benchmarks showing improvement
  • Documentation for store integration and new conversation wrapper
  • Full backward compatibility maintained (existing compress(messages[]) unchanged)

Additional Context

Supporting information

Design principles alignment:

  • Model-agnostic: Store interface doesn't depend on specific storage technology
  • Framework-independent: Works with any storage backend (Redis, DB, filesystem, memory)
  • BYOM pattern: Users provide their own store implementation
  • Zero runtime dependencies: Core library remains dependency-free

Potential store implementations users might provide:

  • Redis for distributed caching
  • Database tables for persistence
  • File system for local caching
  • Cloud storage (S3, etc.) for serverless environments

Performance impact:

  • Should significantly reduce AI model calls for repeated conversation compression
  • Cache hits avoid expensive summarization operations
  • Memory usage increases with cached summaries (acceptable tradeoff)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions