Skip to content

Commit cb8def1

Browse files
committed
docs: Add garbage collection documentation and update references
1 parent 157f3ab commit cb8def1

File tree

4 files changed

+77
-18
lines changed

4 files changed

+77
-18
lines changed
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Garbage Collection (LRU)
2+
3+
This document details how the SDK manages local cache size to prevent unbounded growth. It explains the distinction between Eager and LRU collection, the criteria for deletion, and the sequence-number-based algorithm used to identify old data.
4+
5+
## Strategies: Eager vs. LRU
6+
7+
The SDK employs two different strategies depending on the persistence mode:
8+
9+
1. **Eager GC (Memory Persistence)**:
10+
* Used when persistence is disabled (in-memory only).
11+
* **Behavior**: When a query is stopped (`unsubscribe()`), the SDK immediately releases the reference to the data. If no other active query references those documents, they are deleted from memory instantly.
12+
* **Pros/Cons**: extremely memory efficient, but offers no offline caching across app restarts.
13+
14+
2. **LRU GC (Disk Persistence)**:
15+
* Used when persistence is enabled (IndexedDB).
16+
* **Behavior**: When a query is stopped, the data remains on disk. A background process periodically checks the total cache size. If it exceeds a threshold, the "Least Recently Used" data is purged.
17+
* **Pros/Cons**: Supports offline apps and faster re-querying, but requires complex management of "Sequence Numbers" to track usage.
18+
19+
*The rest of this document focuses on the LRU strategy.*
20+
21+
## What is Collected?
22+
23+
Garbage collection runs in the background. It does not indiscriminately delete data. It looks for **Eligible** items:
24+
25+
### 1. Inactive Targets
26+
A `Target` (internal query representation) is eligible for collection if it is no longer being listened to by the user.
27+
28+
### 2. Orphaned Documents
29+
A document is only eligible for collection if it is **Orphaned**. A document is Orphaned if:
30+
* **No Active Targets**: It does not match *any* currently active query listener.
31+
* **No Pending Mutations**: There are no local edits (Sets/Patches) waiting to be sent to the backend.
32+
33+
> **Note**: Mutations are *never* garbage collected. They are only removed once the backend accepts or rejects them.
34+
35+
## Key Concepts
36+
37+
### Sequence Numbers (The Logical Clock)
38+
To determine "recency," the SDK maintains a global `last_sequence_number` in the `target_globals` table.
39+
* **Tick**: Every transaction (write, query listen, remote update) increments this number.
40+
* **Tagging**: When a Target is actively listened to or updated, its `last_listen_sequence_number` is updated to the current global tick.
41+
* **Effect**: Higher numbers = More recently used.
42+
43+
### The Reference Map (`target_documents`)
44+
This table acts as a reference counter linking Documents to Targets.
45+
* **Active Association**: If `target_id: 2` matches `doc_key: A`, a row exists.
46+
* **Sentinel Rows (`target_id: 0`)**: If a document exists in the cache but is not matched by *any* specific target (perhaps previously downloaded, or part of a target that was deleted), it may have a row with `target_id: 0`. This marks the document as present but potentially orphaned.
47+
48+
## The Collection Algorithm
49+
50+
The `LruGarbageCollector` runs periodically (e.g., every few minutes).
51+
52+
1. **Threshold Check**: It calculates the byte size of the current cache. If `CurrentSize < CacheSizeBytes` (default 100MB), the process aborts.
53+
2. **Calculate Cutoff**:
54+
* The GC decides how many items to cull (e.g., 10%).
55+
* It queries the `target_documents` table, ordered by `sequence_number` ASC.
56+
* It finds the sequence number at the 10th percentile. This becomes the **Upper Bound**.
57+
3. **Sweep Targets**:
58+
* Any Target in the `targets` table with a `last_listen_sequence_number` <= **Upper Bound** is deleted.
59+
* This removes the "Active" link for any documents associated with that target.
60+
4. **Sweep Documents**:
61+
* The GC scans for documents that have *no* rows in `target_documents` (or only sentinel rows) AND have a sequence number <= **Upper Bound**.
62+
* These "Orphaned" documents are deleted from the `remote_documents` table.

packages/firestore/devdocs/overview.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,8 @@ To navigate the internals of the SDK, use the following guide:
4444
* **[Query Lifecycle](./query-lifecycle.md)**: The state machine of a query. **Read this** to understand how querying and offline capabilities work.
4545

4646
### Subsystem Deep Dives
47-
* **[Persistence Schema](./persistence-schema.md)**: A reference guide for the IndexedDB tables (e.g., `remote_documents`, `mutation_queues`).
47+
* **[Persistence Schema](./persistence-schema.md)**: A reference guide for the IndexedDB tables.
48+
* **[Garbage Collection](./garbage-collection.md)**: Details the LRU algorithm, Sequence Numbers, and how the SDK manages cache size.
4849
* **[Query Execution](./query-execution.md)**: Details on the algorithms used by the Local Store to execute queries (Index Scans vs. Full Collection Scans).
4950
* **[Bundles](./bundles.md)**: How the SDK loads and processes data bundles.
5051

packages/firestore/devdocs/persistence-schema.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,10 @@ While the Android/iOS SDKs use SQLite, the JS SDK uses IndexedDB Object Stores.
3333

3434
### `target_documents` (The Index)
3535
* **Concept**: A reverse index mapping `TargetID` $\leftrightarrow$ `DocumentKey`.
36-
* **Purpose**: Optimization. When a query is executed locally, the SDK uses this index to quickly identify which documents belong to a specific TargetID without scanning the entire `remote_documents` table.
36+
* **Purpose**:
37+
1. **Query Execution**: Quickly identify documents for a query.
38+
2. **Garbage Collection**: Acts as a reference counter. If a document has entries here with active TargetIDs, it cannot be collected.
39+
* **Sentinel Rows**: A row with `TargetID = 0` indicates the document exists in the cache but may not be attached to any active listener. These are primary candidates for Garbage Collection.
3740
* **Maintenance**: This is updated whenever a remote snapshot adds/removes a document from a query view.
3841

3942
## Metadata & Garbage Collection Stores

packages/firestore/devdocs/query-lifecycle.md

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -79,22 +79,15 @@ When a user calls `unsubscribe()`, data is **not** immediately deleted.
7979

8080
## Phase 6: Garbage Collection (The "Death" of a Query)
8181

82-
Since data persists after `unsubscribe()`, the SDK must actively manage disk usage. This is handled by the **LruGarbageCollector**.
83-
84-
### Sequence Numbers
85-
Every transaction (write, listen, update) increments a global `last_sequence_number` in the `target_globals` table.
86-
* **Active Targets**: When a query is listened to, its `target` entry is updated with the current global sequence number.
87-
* **Inactive Targets**: Old queries retain older sequence numbers.
88-
89-
### The GC Process
90-
When the SDK detects memory/disk pressure (or periodically):
91-
1. **Reference Delegate**: Scans the `target_documents` reverse index.
92-
2. **Orphan Check**: It identifies documents that belong *only* to Targets that are:
93-
* Inactive (0 listeners).
94-
* Old (Sequence number is below the GC threshold).
95-
3. **Purge**:
96-
* The document is deleted from `remote_documents`.
97-
* The `Target` metadata is eventually removed from the `targets` table.
82+
Since data persists after `unsubscribe()`, the SDK must actively manage disk usage.
83+
84+
* **Eager GC**: If persistence is disabled, data is cleared from memory immediately when the listener count hits 0.
85+
* **LRU GC**: If persistence is enabled, the data remains on disk for offline availability.
86+
87+
The **LruGarbageCollector** runs periodically to keep the cache within the configured size (default 40MB/100MB). It uses a "Sequence Number" system to track when data was last used.
88+
89+
For a detailed walkthrough of the algorithm, Sequence Numbers, and Orphaned Documents, see **[Garbage Collection](./garbage-collection.md)**.
90+
9891

9992
## Debugging Tips
10093

0 commit comments

Comments
 (0)