-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem Statement
What problem does this feature solve?
The current SummarizeCompressor implementation has a significant performance and cost issue. Every time compression is triggered, it re-summarizes the entire message history from scratch, even if portions of that history have been previously summarized.
Specific issues:
- Performance: Lines 97-111 in
src/strategies/summarize.tsalways process the complete message range, leading to redundant AI model calls - Cost: Repeated summarization of the same content increases API costs unnecessarily
- Inefficiency: The original goal of summarization (reduce hallucination and save costs) is undermined by this approach
Current behavior:
// Every compression call processes ALL messages in range
const messagesToSummarize = messages.slice(summarizeStart, keepTailStart);
const conversationText = messagesToSummarize
.map((msg) => `${msg.role}: ${msg.content}`)
.join('\n---\n');Proposed Solution
High-level approach to solving the problem
Introduce a data store interface that allows caching of previously generated summaries, following the library's "Bring Your Own Model" (BYOM) pattern with "Bring Your Own Store" (BYOS).
Key components:
SlimContextStoreinterface for storage abstraction- Enhanced message identification system with thread/conversation IDs
- Intelligent cache key strategy using conversation context + message ranges
- Modified
SummarizeCompressorto check cache before generating new summaries - Optional
InMemoryStoreimplementation for testing and learning purposes
Technical Details
Implementation considerations
New interfaces needed:
interface SlimContextStore {
get(key: string): Promise<string | null>;
set(key: string, value: string): Promise<void>;
delete(key: string): Promise<void>;
}
// Conversation wrapper to avoid repetitive threadId on each message
interface SlimContextConversation {
threadId: string;
messages: SlimContextMessage[];
metadata?: Record<string, unknown>;
}
// Keep SlimContextMessage clean and focused (no repetitive threadId)
interface SlimContextMessage {
role: 'system' | 'user' | 'assistant' | 'tool' | 'human';
content: string;
metadata?: Record<string, unknown>;
id?: string; // Optional message identifier
index?: number; // Position within conversation
}
interface CacheKey {
threadId: string;
type: 'summary' | 'message';
startIndex: number;
endIndex?: number; // For range summaries
}Cache key strategy:
- Format:
"thread_{threadId}:summary:{startIndex}-{endIndex}" - Example:
"thread_123:summary:5-15"(summary of messages 5-15 in thread 123) - Avoids using message content as keys (inefficient for long messages)
Backward Compatibility Strategy:
// Enhanced compressor interface with method overloading
interface SlimContextCompressor {
// Existing method - maintains full backward compatibility
compress(messages: SlimContextMessage[]): Promise<SlimContextMessage[]>;
// New method - accepts conversation wrapper for enhanced functionality
compress(conversation: SlimContextConversation): Promise<SlimContextConversation>;
}
// Utility functions for format conversion
function wrapMessages(messages: SlimContextMessage[], threadId: string): SlimContextConversation;
function unwrapMessages(conversation: SlimContextConversation): SlimContextMessage[];Integration points:
- Modify
SummarizeCompressor.compress()to check cache before summarizing - Add method overloading to support both message arrays and conversation wrappers
- Add store configuration to
SummarizeConfig - Update token estimation to account for cached summaries
- Cache key generation uses
conversation.threadIdinstead of per-message repetition
Implementation Considerations
Open questions and design decisions
1. Summary Combination Strategy
When extending a cached summary (e.g., have summary 5-15, need summary 5-25):
Option A: AI-Driven Combination
- Send model: existing summary (5-15) + new messages (16-25) combined summary (5-25)
- Pros: Intelligent merging, better context preservation, can resolve contradictions
- Cons: More expensive, potentially slower, risk of AI hallucination
Option B: Client-Side Concatenation
- Send model: only new messages (16-25) new summary (16-25)
- Concatenate: summary(5-15) + summary(16-25) = combined(5-25)
- Pros: Cost-effective, faster, predictable behavior
- Cons: Potential fragmentation, no cross-segment awareness
Option C: Hybrid Configurable
- Allow users to choose strategy based on cost/quality tradeoffs
- Default to client-side with option for AI-driven
2. Cache Invalidation Strategy
- Should cache entries expire?
- How to handle message updates/edits?
- Thread-based vs global cache management
3. Store Interface Scope
- Keep minimal (get/set/delete) or add advanced features (batch operations, TTL)?
- Async vs sync interface design?
4. Thread ID Management
- Who provides the thread ID? User application or library?
- Default behavior when no thread ID provided?
- Should we auto-generate thread IDs for backward compatibility?
5. Conversation Wrapper Benefits
- Eliminates repetition: No
threadIdduplication across messages - Cleaner API: Separates conversation context from individual message data
- Better performance: Reduces memory usage and serialization overhead
- Extensible: Easy to add conversation-level metadata without touching messages
Acceptance Criteria
Definition of done
-
SlimContextStoreinterface defined insrc/interfaces.ts -
SlimContextConversationwrapper interface implemented - Method overloading for compressors (both message array and conversation wrapper)
- Utility functions for format conversion (
wrapMessages/unwrapMessages) - Modified
SummarizeCompressorto use store for caching - Cache key generation utilities using conversation context
-
InMemoryStorereference implementation - Summary combination strategy implemented (choose one approach initially)
- Unit tests for caching behavior and backward compatibility
- Performance benchmarks showing improvement
- Documentation for store integration and new conversation wrapper
- Full backward compatibility maintained (existing
compress(messages[])unchanged)
Additional Context
Supporting information
Design principles alignment:
- Model-agnostic: Store interface doesn't depend on specific storage technology
- Framework-independent: Works with any storage backend (Redis, DB, filesystem, memory)
- BYOM pattern: Users provide their own store implementation
- Zero runtime dependencies: Core library remains dependency-free
Potential store implementations users might provide:
- Redis for distributed caching
- Database tables for persistence
- File system for local caching
- Cloud storage (S3, etc.) for serverless environments
Performance impact:
- Should significantly reduce AI model calls for repeated conversation compression
- Cache hits avoid expensive summarization operations
- Memory usage increases with cached summaries (acceptable tradeoff)