Skip to content

Conversation

@enigbe
Copy link
Contributor

@enigbe enigbe commented Nov 1, 2025

What this PR does

In this PR we introduce TierStore, a three-tiered (KVStore + KVStoreSync) implementation that manages data across three distinct storage layers based on criticality.

Background

As we have moved towards supporting remote storage with VssStore, we need to recognize that not all data has the same storage requirements. Currently, all data goes to a single store which creates some problems:

  • Remote storage adds latency on every read/write operation
  • Network graph and scorer data is ephemeral and easily rebuild-able from the network
  • Persisting ephemeral data to remote storage wastes bandwidth and adds unnecessary latency
  • Users persisting to remote storage have no local disaster recovery option

This PR proposes tiered storage that provides granular control over where different data types are stored. The tiers include:

  • Ephemeral store (network graph, scorer): optional fast local storage to avoid remote latency
  • Primary store (channels, payments, wallet): preferably remote storage for critical data
  • Backup store: optional (preferably local) backup for disaster recovery, with data written to (removed from) lazily to avoid blocking primary data operations

Additionally, we also permit the configuration of Node with tiered storage allowing callers to:

  • set retry parameters,
  • set backup and ephemeral stores, and to
  • build the Node with a primary store.
    These configuration options also extend to our foreign interface, allowing bindings target to build the Node with their own (KVStore + KVStoreSync) implementations. A sample Python implementation is provided and tested.

Concerns

  1. Nested Retry Logic: VssStore has built-in retry logic. Wrapping it in TierStore creates nested retries.
  2. Backup Queue Capacity: Currently hard-coded (100 in prod, 5 in tests). Ideally this should be configurable but there are concerns about exposing this to callers? What is a sensible default?
  3. Data Consistency Between Primary and Backup: Backup operates asynchronously and may lag behind or miss operations if the queue overflows. Should we consider adding a background sync task to reconcile backup with primary? Expose metrics for backup lag/queue depth? Implement catch-up logic on startup?
  4. Exposing KVStore to the FFI

Related Issues

@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Nov 1, 2025

I've assigned @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-reviews-bot ldk-reviews-bot requested a review from tnull November 1, 2025 23:18
@ldk-reviews-bot
Copy link

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch 4 times, most recently from 29f47f3 to 264aa7f Compare November 4, 2025 22:07
@ldk-reviews-bot
Copy link

🔔 2nd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 3rd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 4th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 5th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 6th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 7th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 8th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 9th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 10th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 11th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 12th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 13th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from 264aa7f to 493dd9a Compare December 2, 2025 19:50
@ldk-reviews-bot
Copy link

🔔 14th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Introduces TierStore, a KVStore implementation that manages data
across three storage layers:

- Primary: Main/remote data store
- Ephemeral: Secondary store for non-critical, easily-rebuildable
  data (e.g., network graph) with fast local access
- Backup: Tertiary store for disaster recovery with async/lazy
  operations to avoid blocking primary store

Adds four configuration methods to NodeBuilder:

- set_tier_store_backup: Configure backup data store
- set_tier_store_ephemeral: Configure ephemeral data store
- set_tier_store_retry_config: Configure retry parameters with
  exponential backoff
- build_with_tier_store: Build node with primary data store

These methods are exposed to the foreign interface via additions
in ffi/types.rs:

- ffi::SyncAndAsyncKVStore: Composed of KVStore and KVStoreSync
  methods to handle the types::SyncAndAsyncKVStore supertrait
  across FFI
- ffi::ForeignKVStoreAdapter and ffi::DynStore: Adapt/translate
  between foreign language store and native Rust store
- Conditional compilation for DynStore: ffi::DynStore with uniffi,
  types::DynStore without, with selection aided by the wrap_store!()
  macro
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from 493dd9a to a30cbfb Compare December 4, 2025 23:10
This commit adds unit, integration, and FFI tests
for the TierStore implementation:
- Unit tests for TierStore core functionality
- Integration tests for nodes built with tiered storage
- Python FFI tests for foreign key-value store
@enigbe enigbe force-pushed the 2025-10-tiered-data-storage branch from a30cbfb to 1e7bdbc Compare December 4, 2025 23:30
@ldk-reviews-bot
Copy link

🔔 15th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

@ldk-reviews-bot
Copy link

🔔 16th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Copy link
Collaborator

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this and excuse the delay here!

I did a first pass, generally this looks already pretty good, but it is a huge PR and in some areas could be simplified. For instance, we should drop the generic Retry logic as the concrete implementations would already implement that if they need it. Likewise, we shouldn't fallback to the backup for now as it's really only meant as disaster recovery (KVStore caching is out-of-scope for this PR now, even though we might want to explore that soon, too). There is also no need to replicate the write-ordering locks in TierStore, but we'll need them in ForeignKVStoreAdapter (which might be mergeable with DynStore?).

Generally, if you find opportunities to reduce the size of the changeset here it would be appreciated. It would also be cool if you could try to break up the PR into more feature commits, but feel free to leave as is if not.


/// Configuration for exponential backoff retry behavior.
#[derive(Debug, Copy, Clone)]
pub struct RetryConfig {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrying is pretty specific to the particular KVStore implementation. I don't think we should expose retrying params on top of what we already do for implementations internally.

}

pub struct TierStoreInner {
/// For remote data.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might also be local.

[Throws=BuildError]
Node build_with_vss_store_and_header_provider(NodeEntropy node_entropy, string vss_url, string store_id, VssHeaderProvider header_provider);
[Throws=BuildError]
Node build_with_tier_store(NodeEntropy node_entropy, DynStore primary_store);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we now also expose Builder::build_with_store?

Self { inner: Arc::new(adapter) }
}

pub fn from_ldk_store(store: Arc<dyn LdkSyncAndAsyncKVStore + Send + Sync>) -> Arc<Self> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we rather make this an impl From<Arc<dyn LdkSyncAndAsyncKVStore + Send + Sync>> for Arc .., we can drop the wrap_storemacro and just use.into()` instead.

}

let store = wrap_store!(Arc::new(tier_store));
self.build_with_store(node_entropy, store)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use build_with_store_internal here to make sure we're using the same Runtime etc.

Arc::clone(&outer_lock.entry(locking_key).or_default())
}

async fn execute_locked_write<
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, why are we replicating all the write-ordering logic here? Given the implementations are required to fullfill LDK's requirements already, can't we just call the inner write and be done with it?

) -> Pin<Box<dyn Future<Output = Result<(), lightning::io::Error>> + Send>> {
let inner = self.inner.clone();

let primary_namespace = primary_namespace.to_string();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here (and in remove) we need to add the write-ordering logic to ensure we follow LDK's KVStore::write requirements.

/// A type alias for [`SyncAndAsyncKVStore`] with `Sync`/`Send` markers;
pub type DynStore = dyn SyncAndAsyncKVStore + Sync + Send;
#[cfg(feature = "uniffi")]
pub(crate) use crate::DynStore;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd. Why do we need this?

}
}

pub struct DelayedStore {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what coverage we gain with DelayedStore? IMO, we might be better off dropping all this boilerplate.

) -> Result<Vec<String>, IOError>;
}

pub struct ForeignKVStoreAdapter {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, any reason this needs to be separate from DynStore? Couldn't we merge them?

}
}

#[async_trait]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth taking a dependency just to desugar impl Future<Output = Result<(), IOError>> + Send + Sync + 'a.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth taking a dependency just to desugar impl Future<Output = Result<(), IOError>> + Send + Sync + 'a.

Well, it's 'the official'/supported way to do async traits with Uniffi: https://mozilla.github.io/uniffi-rs/latest/futures.html#exporting-async-trait-methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does uniffi require that in some way? Its just a comically-overkill way to sugar impl Future which is kinda nuts...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, worse, it looks like async_trait desugars to the Pin<Box<dyn ...>> version...which is dumb but at least for uniffi it shouldn't matter cause we'd have to do that anyway.

Copy link
Collaborator

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this will need a substantial rebase now that #696 landed, sorry for that!

Besides some tedious rebase work, we now have DynStoreWrapper which can be used on the ffi and the non-ffi side. I do wonder if we can actually just merge that wrapper with the TierStore, as well as the ForeignKVStoreAdapter/DynStore. Seems all these wrapper structs may not be needed and we could just get away with one trait and one wrapper struct, mostly?

@TheBlueMatt
Copy link
Contributor

Probably, indeed. To avoid exposing the wrappers in Rust we'll have to put them in a module which is conditionally pub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants