-
Notifications
You must be signed in to change notification settings - Fork 4
Add _planetscale_metadata to each Airbyte record for deterministic event ordering #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add _planetscale_metadata to each Airbyte record for deterministic event ordering #143
Conversation
…s, for event ordering issues
maxenglander
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you introduce a new option so that users can opt into this behavior? this seems reasonable but i'm hesitant to make this the default right away.
8b88d96 to
6cfbfe7
Compare
|
Added extracted_at and sequence_number to the metadata. These fields are important because VGTIDs can contain multiple incremental counters, and when a primary switch occurs their ordering may become ambiguous. The new fields provide a reliable fallback for event ordering when VGTID alone is insufficient |
|
Also made the metadata column optional as requested |
maxenglander
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @lahdirakram this is looking good. would it be possible to add a test case for this?
|
Hello @maxenglander i added tests |
|
thanks @lahdirakram for the contribution 🙇 |
Summary
This change enhances the PlanetScale Airbyte source by adding additional metadata to every emitted record under the _planetscale_metadata field. The metadata now includes the VGTID position, an extracted_at timestamp (nanoseconds), and a per-sync sequence_number. The metadata field is optional and only included when enabled in the connector settings.
Problem
When syncing large volumes of data, many records can share the same emitted_at timestamp. While the VGTID normally provides ordering, it can contain multiple incremental counters, and ordering becomes ambiguous in scenarios such as primary switches. As a result, downstream systems may be unable to deterministically reconstruct the exact order of events, leading to inconsistencies in CDC replay.
Solution
Including the VGTID position along with two new ordering signals (extracted_at and sequence_number) allows destinations to:
• Reconstruct the exact order of changes even when VGTIDs are ambiguous
• Ensure deterministic application of updates and deletes
• Avoid relying on coarse or duplicated timestamps like emitted_at
Implementation Details
• vgtid_position – the Vitess binlog position
• extracted_at – high-precision extraction timestamp (ns)
• sequence_number – strict, per-sync sequence number
Example output
{ "id": 42, "name": "Alice", "_planetscale_metadata": { "vgtid_position": "MySQL56/abcdef12:1-105", "extracted_at": 1762424365011596273, "sequence_number": 1 } }Impact