Skip to content

Conversation

@lahdirakram
Copy link
Contributor

@lahdirakram lahdirakram commented Oct 30, 2025

Summary

This change enhances the PlanetScale Airbyte source by adding additional metadata to every emitted record under the _planetscale_metadata field. The metadata now includes the VGTID position, an extracted_at timestamp (nanoseconds), and a per-sync sequence_number. The metadata field is optional and only included when enabled in the connector settings.

Problem

When syncing large volumes of data, many records can share the same emitted_at timestamp. While the VGTID normally provides ordering, it can contain multiple incremental counters, and ordering becomes ambiguous in scenarios such as primary switches. As a result, downstream systems may be unable to deterministically reconstruct the exact order of events, leading to inconsistencies in CDC replay.

Solution

Including the VGTID position along with two new ordering signals (extracted_at and sequence_number) allows destinations to:
• Reconstruct the exact order of changes even when VGTIDs are ambiguous
• Ensure deterministic application of updates and deletes
• Avoid relying on coarse or duplicated timestamps like emitted_at

Implementation Details

  • Metadata now contains:
    • vgtid_position – the Vitess binlog position
    • extracted_at – high-precision extraction timestamp (ns)
    • sequence_number – strict, per-sync sequence number
  • Metadata emission is optional and controlled by the include_metadata flag
  • Implementation isolated to printQueryResult to keep the event flow unchanged
  • No breaking schema changes

Example output

{
  "id": 42,
  "name": "Alice",
  "_planetscale_metadata": {
    "vgtid_position": "MySQL56/abcdef12:1-105",
    "extracted_at": 1762424365011596273,
    "sequence_number": 1
  }
}

Impact

•	✅ More reliable downstream event ordering
•	✅ Handles ambiguous ordering cases (e.g., primary switches)
•	✅ Backward-compatible and optional

@lahdirakram lahdirakram changed the title Add _metadata.vgtid_position to each Airbyte record for deterministic event ordering Add _planetscale_metadata.vgtid_position to each Airbyte record for deterministic event ordering Oct 30, 2025
Copy link
Collaborator

@maxenglander maxenglander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you introduce a new option so that users can opt into this behavior? this seems reasonable but i'm hesitant to make this the default right away.

@lahdirakram lahdirakram force-pushed the feature/include_planetscale_metadata_field branch from 8b88d96 to 6cfbfe7 Compare November 6, 2025 10:15
@lahdirakram
Copy link
Contributor Author

Added extracted_at and sequence_number to the metadata. These fields are important because VGTIDs can contain multiple incremental counters, and when a primary switch occurs their ordering may become ambiguous. The new fields provide a reliable fallback for event ordering when VGTID alone is insufficient

@lahdirakram
Copy link
Contributor Author

Also made the metadata column optional as requested

@lahdirakram lahdirakram changed the title Add _planetscale_metadata.vgtid_position to each Airbyte record for deterministic event ordering Add _planetscale_metadata to each Airbyte record for deterministic event ordering Nov 6, 2025
Copy link
Collaborator

@maxenglander maxenglander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @lahdirakram this is looking good. would it be possible to add a test case for this?

@lahdirakram
Copy link
Contributor Author

Hello @maxenglander i added tests

@maxenglander
Copy link
Collaborator

thanks @lahdirakram for the contribution 🙇

@maxenglander maxenglander merged commit 1eb88ec into planetscale:main Nov 17, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants