Skip to content

Conversation

@neekolas
Copy link
Contributor

@neekolas neekolas commented Nov 20, 2025

Add cross-process file locking for MLS group commits

TL;DR

Implemented a file-based locking mechanism for MLS group commits that works across processes, preventing race conditions when multiple clients access the same group.

What changed?

  • Created a new commit_lock.rs module with platform-specific implementations:
    • Native: Uses fd-lock for file-based locking that works across processes
    • WASM: Falls back to in-memory mutexes (previous behavior)
  • Modified GroupCommitLock to create lock files in a directory based on the database path and installation ID
  • Updated error handling to properly propagate lock acquisition failures
  • Ensured locks are properly released when guards are dropped

How to test?

  1. Run multiple XMTP clients simultaneously on the same machine accessing the same group
  2. Verify that commits from one client don't interfere with another
  3. Check that lock files are created in the expected directory (next to the database file)
  4. Verify that locks are properly released after operations complete

Why make this change?

Previously, the locking mechanism only worked within a single process. This could lead to race conditions and data corruption when multiple clients (in different processes) tried to modify the same group simultaneously. The file-based locking ensures that only one process can modify a group at a time, preventing potential data corruption and ensuring consistency across clients.

@claude
Copy link

claude bot commented Nov 20, 2025

Claude Code is working…

I'll analyze this and get back to you.

View job run

Copy link
Contributor Author


How to use the Graphite Merge Queue

Add the label mergequeue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@macroscopeapp
Copy link
Contributor

macroscopeapp bot commented Nov 20, 2025

Add experimental cross-process MLS commit locking using fd-lock and initialize lock in xmtp_mls::builder::ClientBuilder::build tied to the database path and installation ID

Introduce native fd-lock-based group commit locks with async/sync acquisition, add WASM in-memory fallback, wire lock initialization in ClientBuilder::build, and propagate structured commit lock errors through group loading and message processing APIs.

📍Where to Start

Start with lock initialization in xmtp_mls::builder::ClientBuilder::build in builder.rs, then review the lock implementation in commit_lock.rs.


📊 Macroscope summarized 3515ff5. 5 files reviewed, 28 issues evaluated, 25 issues filtered, 1 comment posted. View details

}

/// Open or create the lock file for a given group ID.
fn open_lock_file(&self, group_id: &[u8]) -> Result<File, std::io::Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty group_id yields an empty filename, so open_lock_file tries to open the directory and fails. Consider rejecting empty group_id early (e.g., return std::io::ErrorKind::InvalidInput).

-        fn open_lock_file(&self, group_id: &[u8]) -> Result<File, std::io::Error> {
-            let path = self.lock_file_path(group_id);
+        fn open_lock_file(&self, group_id: &[u8]) -> Result<File, std::io::Error> {
+            if group_id.is_empty() {
+                return Err(std::io::Error::new(std::io::ErrorKind::InvalidInput, "empty group_id"));
+            }
+            let path = self.lock_file_path(group_id);
             OpenOptions::new()
                 .read(true)
                 .write(true)
                 .create(true)
                 .open(path)
         }

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

impl MlsGroupGuard {
/// Create a new guard by acquiring a blocking write lock.
fn new(file: File) -> Result<Self, CommitLockError> {
// Box the lock so it has a stable address
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: avoid converting the lock Box to a raw pointer before fallible steps. Defer Box::into_raw until after the guard is boxed and the write lock succeeds, or ensure error paths convert back with Box::from_raw so RwLock/File drop.

-             let lock = Box::into_raw(Box::new(RwLock::new(file)));
+             let lock_box = Box::new(RwLock::new(file));
@@
-                 let guard = (*lock)
+                 let guard = (*lock_box)
                  .write()
                  .map_err(|e| CommitLockError::LockAcquire(e.to_string()))?;
@@
-                 let guard = Box::into_raw(Box::new(guard));
+                 let guard = Box::into_raw(Box::new(guard));
+                 let lock = Box::into_raw(lock_box);
@@
-                 Ok(Self { guard, lock })
+                 Ok(Self { guard, lock })

🚀 Reply to ask Macroscope to explain or update this suggestion.

👍 Helpful? React to give us feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants