|
| 1 | +# Garbage Collection (LRU) |
| 2 | + |
| 3 | +This document details how the SDK manages local cache size to prevent unbounded growth. It explains the distinction between Eager and LRU collection, the criteria for deletion, and the sequence-number-based algorithm used to identify old data. |
| 4 | + |
| 5 | +## Strategies: Eager vs. LRU |
| 6 | + |
| 7 | +The SDK employs two different strategies depending on the persistence mode: |
| 8 | + |
| 9 | +1. **Eager GC (Memory Persistence)**: |
| 10 | + * Used when persistence is disabled (in-memory only). |
| 11 | + * **Behavior**: When a query is stopped (`unsubscribe()`), the SDK immediately releases the reference to the data. If no other active query references those documents, they are deleted from memory instantly. |
| 12 | + * **Pros/Cons**: extremely memory efficient, but offers no offline caching across app restarts. |
| 13 | + |
| 14 | +2. **LRU GC (Disk Persistence)**: |
| 15 | + * Used when persistence is enabled (IndexedDB). |
| 16 | + * **Behavior**: When a query is stopped, the data remains on disk. A background process periodically checks the total cache size. If it exceeds a threshold, the "Least Recently Used" data is purged. |
| 17 | + * **Pros/Cons**: Supports offline apps and faster re-querying, but requires complex management of "Sequence Numbers" to track usage. |
| 18 | + |
| 19 | +*The rest of this document focuses on the LRU strategy.* |
| 20 | + |
| 21 | +## What is Collected? |
| 22 | + |
| 23 | +Garbage collection runs in the background. It does not indiscriminately delete data. It looks for **Eligible** items: |
| 24 | + |
| 25 | +### 1. Inactive Targets |
| 26 | +A `Target` (internal query representation) is eligible for collection if it is no longer being listened to by the user. |
| 27 | + |
| 28 | +### 2. Orphaned Documents |
| 29 | +A document is only eligible for collection if it is **Orphaned**. A document is Orphaned if: |
| 30 | +* **No Active Targets**: It does not match *any* currently active query listener. |
| 31 | +* **No Pending Mutations**: There are no local edits (Sets/Patches) waiting to be sent to the backend. |
| 32 | + |
| 33 | +> **Note**: Mutations are *never* garbage collected. They are only removed once the backend accepts or rejects them. |
| 34 | +
|
| 35 | +## Key Concepts |
| 36 | + |
| 37 | +### Sequence Numbers (The Logical Clock) |
| 38 | +To determine "recency," the SDK maintains a global `last_sequence_number` in the `target_globals` table. |
| 39 | +* **Tick**: Every transaction (write, query listen, remote update) increments this number. |
| 40 | +* **Tagging**: When a Target is actively listened to or updated, its `last_listen_sequence_number` is updated to the current global tick. |
| 41 | +* **Effect**: Higher numbers = More recently used. |
| 42 | + |
| 43 | +### The Reference Map (`target_documents`) |
| 44 | +This table acts as a reference counter linking Documents to Targets. |
| 45 | +* **Active Association**: If `target_id: 2` matches `doc_key: A`, a row exists. |
| 46 | +* **Sentinel Rows (`target_id: 0`)**: If a document exists in the cache but is not matched by *any* specific target (perhaps previously downloaded, or part of a target that was deleted), it may have a row with `target_id: 0`. This marks the document as present but potentially orphaned. |
| 47 | + |
| 48 | +## The Collection Algorithm |
| 49 | + |
| 50 | +The `LruGarbageCollector` runs periodically (e.g., every few minutes). |
| 51 | + |
| 52 | +1. **Threshold Check**: It calculates the byte size of the current cache. If `CurrentSize < CacheSizeBytes` (default 100MB), the process aborts. |
| 53 | +2. **Calculate Cutoff**: |
| 54 | + * The GC decides how many items to cull (e.g., 10%). |
| 55 | + * It queries the `target_documents` table, ordered by `sequence_number` ASC. |
| 56 | + * It finds the sequence number at the 10th percentile. This becomes the **Upper Bound**. |
| 57 | +3. **Sweep Targets**: |
| 58 | + * Any Target in the `targets` table with a `last_listen_sequence_number` <= **Upper Bound** is deleted. |
| 59 | + * This removes the "Active" link for any documents associated with that target. |
| 60 | +4. **Sweep Documents**: |
| 61 | + * The GC scans for documents that have *no* rows in `target_documents` (or only sentinel rows) AND have a sequence number <= **Upper Bound**. |
| 62 | + * These "Orphaned" documents are deleted from the `remote_documents` table. |
0 commit comments