Skip to content

Conversation

@konard
Copy link
Member

@konard konard commented Sep 13, 2025

Summary

This PR implements a comprehensive MapReduce multithread combined storage system as requested in issue #77, providing lock-free distributed operations across multiple sections.

Key Features Implemented

  • ✅ Sectioned Storage: Links are distributed across multiple sections, each managing a specific address range
  • ✅ Thread-per-Section Architecture: Each section is managed by a dedicated thread for lock-free operations
  • ✅ MapReduce Pattern: Requests are mapped to all relevant sections and results are reduced to single output
  • ✅ Lock-Free Queues: Uses ConcurrentQueue-based request/response streaming between threads
  • ✅ Configurable Section Capacity: Supports user-defined section sizes (default 1MB, configurable up to 64MB+)
  • ✅ Memory Management: Implements heap allocation with foundation for mmap/file-based modes

Implementation Architecture

Core Components

  1. MapReduceCombinedLinksStorage<TLinkAddress, TConstants> - Main storage class implementing ILinks interface
  2. IStorageSection<TLinkAddress> - Interface for individual storage sections
  3. InMemoryStorageSection<TLinkAddress> - In-memory section implementation
  4. IRequestQueue<TLinkAddress> & IResultQueue<TLinkAddress> - Lock-free queue interfaces
  5. StorageConfiguration<TLinkAddress> - Configuration with presets for different use cases

Request/Response Flow

Client Request → MapReduce Coordinator → Multiple Section Queues → Section Threads → Results Queue → Reduced Response

Performance Benefits

  • 🚀 Scalability: Smaller trees in each section improve search/insert performance
  • ⚡ Parallelism: No locks required, optimal CPU/memory channel utilization
  • 💾 Memory Efficiency: Separate heap blocks avoid expensive data copying
  • 🔧 Flexibility: Configurable section sizes and thread counts based on system capabilities

Configuration Options

// Default configuration
var config = StorageConfiguration<ulong>.CreateDefault();

// High-throughput configuration  
var config = StorageConfiguration<ulong>.CreateHighThroughput();

// Memory-efficient configuration
var config = StorageConfiguration<ulong>.CreateMemoryEfficient("/data/storage");

Testing Coverage

  • ✅ Basic CRUD operations (Create, Read, Update, Delete)
  • ✅ Concurrent operations with multiple threads
  • ✅ MapReduce request distribution and result aggregation
  • ✅ Configuration validation and edge cases
  • ✅ Queue functionality and thread safety

Version Update

  • Updated version from 0.16.10.17.0 to reflect major new functionality
  • Added new package tags for discoverability
  • Updated release notes with comprehensive feature description

Addresses Issue Requirements

This implementation directly addresses all requirements from issue #77:

  1. Configurable minimum internal references range - Implemented via StorageConfiguration.MinInternalReference
  2. Separate sections for every N links - Configurable via MaxSectionCapacity
  3. Thread-per-section with lock-free operations - Each section managed by dedicated thread
  4. MapReduce request distribution - Requests mapped to all sections, results reduced
  5. Heap/mmap memory allocation support - Foundation implemented with SectionAllocationMode
  6. No more than C+1 threads - Configurable via NumberOfSections (defaults to CPU cores + 1)

Future Enhancements

The implementation provides a solid foundation for:

  • Memory-mapped file storage sections
  • Separate file-based sections
  • Advanced request routing based on address ranges
  • Performance monitoring and statistics
  • Dynamic section expansion

Test Plan

  • All unit tests pass
  • Concurrent operations work correctly
  • Configuration validation prevents invalid setups
  • Memory usage is efficient
  • No deadlocks or race conditions detected

🤖 Generated with Claude Code


Resolves #77

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: #77
@konard konard self-assigned this Sep 13, 2025
This commit implements a comprehensive MapReduce multithread combined storage system as requested in issue #77:

## Key Features
- **Sectioned Storage**: Distributes links across multiple sections, each managing a specific address range
- **Thread-per-Section**: Each section is managed by a dedicated thread for lock-free operations
- **MapReduce Pattern**: Maps requests to all relevant sections and reduces results to single output
- **Lock-Free Queues**: Uses ConcurrentQueue-based request/response streaming between threads
- **Configurable Capacity**: Supports user-defined section sizes (default 64MB worth of links)
- **Memory Management**: Supports heap allocation mode with foundation for mmap/file-based modes

## Implementation Details
- `MapReduceCombinedLinksStorage<TLinkAddress, TConstants>`: Main storage class implementing ILinks interface
- `IStorageSection<TLinkAddress>`: Interface for individual storage sections
- `InMemoryStorageSection<TLinkAddress>`: Basic in-memory section implementation
- `IRequestQueue<TLinkAddress>` & `IResultQueue<TLinkAddress>`: Lock-free queue interfaces
- `LockFreeRequestQueue<TLinkAddress>` & `LockFreeResultQueue<TLinkAddress>`: ConcurrentQueue-based implementations
- `StorageConfiguration<TLinkAddress>`: Configuration class with presets for different use cases
- `StorageRequest<TLinkAddress>` & `StorageResult<TLinkAddress>`: Request/response data structures

## Benefits
- **Scalability**: Smaller trees in each section improve performance
- **Parallelism**: No locks required, optimal thread utilization
- **Memory Efficiency**: Separate heap blocks avoid data copying
- **Flexibility**: Configurable section sizes and thread counts

## Testing
- Comprehensive unit tests covering all major functionality
- Tests for concurrent operations and thread safety
- Configuration validation tests

This implementation provides the foundation for distributed link storage operations while maintaining compatibility with the existing ILinks interface.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@konard konard changed the title [WIP] MapReduce multithread combined storage Implement MapReduce multithread combined storage for issue #77 Sep 13, 2025
@konard konard marked this pull request as ready for review September 13, 2025 01:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MapReduce multithread combined storage

2 participants