-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Sentinel scan iterator #3141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Sentinel scan iterator #3141
Conversation
|
Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset. In case there are security findings, they will be communicated to you as a comment inside the PR. Hope you’ll enjoy using Jit. Questions? Comments? Want to learn more? Get in touch with us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ The following Jit checks failed to run:
- secret-detection-trufflehog
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
|
Hi @harshrai654, thanks for the contribution! On first glance, this looks good, gut it looks like there are some failing tests. Can you please resolve those, so we can have more thorough look on the final state of the code |
65129d2 to
1f2471f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ The following Jit checks failed to run:
- secret-detection-trufflehog
- static-code-analysis-semgrep-pro
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
|
Hey @nkaradzhov I have fixed the tests. A stopped node in previous test of the describe block was causing issues in the next test which requires healthy nodes in the sentinel. Added cleaning up of nodes of frame between these two tests. |
| } | ||
| ``` | ||
|
|
||
| If a failover occurs during the scan, the iterator will automatically restart from the beginning on the new master to ensure all keys are covered. This may result in duplicate keys being yielded. If your application requires processing each key exactly once, you should implement a deduplication mechanism (like a `Set` or Bloom filter). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Bloom filter is not really a sound way to deduplicate on its own, so I'm not sure about the phrasing "Set or Bloom filter".
I'm not an expert on Sentinel or Redis replication, but is there any chance the two values differ between old and new master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I agree, A Bloom filter isn’t a perfect deduplication mechanism.
The reason I mentioned it is because it provides a very low-memory way to reduce duplicates for large keyspaces, even though it can still produce false positives and skip some unseen keys.
I’ll update the documentation to clarify this tradeoff: using a Set gives strict, 100% deduplication, while a Bloom filter is only suitable when occasional false positives are acceptable in exchange for significantly lower memory usage. -
Yes, the results may differ between the old and new master.
Redis replication is asynchronous, so a replica promoted to master may not contain exactly the same
keyspace as the previous master at the moment of failover. Some writes may not yet have replicated,
and new writes may occur on the promoted master after the failover.
Because the iterator restarts from cursor 0 on the new master, any additions or deletions that
happened around the failover will be reflected in the new scan.
Reference: https://redis.io/docs/latest/commands/scan/#scan-guarantees
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yeah I only called it out because of the implication you made, i.e. "application requires processing each key exactly once" -> "use set or bloom filter"
- Difference in semantics between this and a regular scan is probably worth noting in the docs as well
|
@harshrai654 thank you so much! I will need to take a thorough look into this before passing it, and I am extremely busy on a big change that needs to happen very soon. So, apologies, but I will have to delay my full review here. |
No worries, Thanks for the update and for letting me know! |
|
This seems to cause a deadlock when the consumer tries to do another operation during the scan and there's not enough master clients available. for await (const keys of sentinel.scanIterator()) {
console.log(keys)
// Hangs forever
const values = await sentinel.mGet(keys)
console.log(values)
} |
|
Hey @TheIndra55, thanks for catching this! You are absolutely correct. Acquiring the master connection for the entire lifespan of the iterator causes a resource starvation deadlock. Since Even with a larger pool, this implementation unnecessarily holds a connection idle while the user's code executes, reducing overall throughput. The fix can be to do an "Acquire->Scan->Release" loop: we should acquire the connection only for the Here is the proposed implementation: async *scanIterator(
this: RedisSentinelType<M, F, S, RESP, TYPE_MAPPING>,
options?: ScanOptions & ScanIteratorOptions
) {
let cursor = options?.cursor ?? "0";
let shouldRestart = false;
// This listener persists for the entire iterator life to track failovers
const handleTopologyChange = (event: RedisSentinelEvent) => {
if (event.type === "MASTER_CHANGE") {
shouldRestart = true;
}
};
this.on("topology-change", handleTopologyChange);
try {
do {
// 1. Acquire lease JUST for the scan command
const masterClient = await this.acquire();
let reply;
try {
// If master changed since last loop, reset cursor to 0
if (shouldRestart) {
cursor = "0";
shouldRestart = false;
}
reply = await masterClient.scan(cursor, options);
} finally {
// 2. Release IMMEDIATELY so the pool is available for the user
masterClient.release();
}
// Handle edge case: Topology changed *during* the scan
if (shouldRestart) {
// Data might be from the old master, so we discard this batch
// and restart from cursor 0 on the new master.
cursor = "0";
continue;
}
// 3. Yield to user
// The connection is released, so `await sentinel.mGet()` will succeed
cursor = reply.cursor;
yield reply.keys;
} while (cursor !== "0");
} finally {
this.removeListener("topology-change", handleTopologyChange);
}
}This ensures we respect the pool limits and handle topology changes correctly by resetting the cursor if a failover occurs between (or during) iterations. What do you think? |
Description
This PR implements the scanIterator method for the
RedisSentinelclient.Previously, scanIterator was not exposed on the Sentinel client, limiting the ability to iterate over keys on the master node directly through the Sentinel interface. This implementation adds that capability with handling for failovers.
Issue: #2999
Changes:
MASTER_CHANGEtopology event and automatically restarts the scan from the beginning on the new master. This ensures all keys are eventually covered, even if the topology changes mid-scan.Checklist
npm testpass with this change (including linting)?