-
Notifications
You must be signed in to change notification settings - Fork 104
Fix replication tests #893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a replication integration test suite to verify that the BanyanDB cluster can survive node failures when data is replicated across multiple nodes. The test creates a cluster with 3 data nodes and a replication factor of 2, then simulates a node failure and verifies data is still accessible.
Key Changes:
- Adds new replication test suite with setup for a 3-node cluster with replication factor of 2
- Introduces
entity_replicatedtest case to verify query functionality after node failure - Creates
replicated_groupconfiguration withreplicas: 2setting
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/integration/replication/replication_test.go | Main test file that verifies cluster survives node failure and data remains accessible |
| test/integration/replication/replication_suite_test.go | Test suite setup with 3 data nodes, etcd server, and schema preloading |
| test/cases/measure/data/want/entity_replicated.yaml | Expected query results for the replicated data test case |
| test/cases/measure/data/input/entity_replicated.yaml | Query specification for testing replicated measure data |
| test/cases/measure/data/input/entity_replicated.ql | Query language definition for the replicated data test |
| test/cases/init.go | Adds data initialization for service_traffic in replicated_group |
| pkg/test/measure/testdata/measures/service_traffic_replicated.json | Measure schema definition for service_traffic in replicated_group |
| pkg/test/measure/testdata/groups/replicated.json | Group configuration with replicas=2 for replication testing |
| CHANGES.md | Documents addition of replication integration test |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| dataNodeClosers[0]() | ||
| dataNodeClosers = dataNodeClosers[1:] |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataNodeClosers variable is accessed in the test but is not properly shared across test processes. In Ginkgo's SynchronizedBeforeSuite, the first function runs once on process 1 and returns serialized data to other processes via the second function. Package-level variables set in the first function are not accessible in tests running in other processes.
The dataNodeClosers slice should be serialized and passed through the suite configuration (like liaisonAddr, etcdEndpoint, and now), but since closers are functions that cannot be serialized, the test architecture needs to be reconsidered.
One solution is to avoid stopping nodes in parallel tests, or ensure this specific test doesn't run in parallel mode. Another approach is to store node information (like node IDs) instead of closers and implement a shutdown mechanism through the test framework.
| for _, closeDataNode := range dataNodeClosers { | ||
| closeDataNode() | ||
| } |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanup logic in deferFunc will attempt to close data nodes that may have already been stopped during test execution (line 83-84 in replication_test.go). When a test calls dataNodeClosers[0]() and then removes it from the slice, the deferFunc will still iterate over the remaining closers, but if tests fail or multiple tests run, this could lead to attempting to close already-closed nodes or missing nodes that should be closed.
Consider implementing idempotent close operations or tracking which nodes have been closed to avoid double-closure issues. Additionally, ensure that node closures in tests don't interfere with the cleanup process.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #893 +/- ##
==========================================
+ Coverage 45.97% 46.25% +0.27%
==========================================
Files 328 367 +39
Lines 55505 56483 +978
==========================================
+ Hits 25520 26124 +604
+ Misses 27909 27854 -55
- Partials 2076 2505 +429
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
CHANGESlog.