Skip to content

Conversation

@maxenglander
Copy link
Collaborator

@maxenglander maxenglander commented Apr 5, 2025

Add two new options.

Specifying timeout in minutes

Make the current hardcoded 5 minute read duration configurable. Keep 5 minutes as the default and minimum. This is how it will look on connectors that upgrade to a version containing this change:

image

GTID position with last known PK

Currently when airbyte-source syncs it wipes out gtid information from the shard gtid before passing it along in the VStream request. This means that, upon retry, it's possible for the connector to lose rows that are inserted between the gtid of the last vgtid event it received in one attempt and the @@gtid_executed of the next sync attempt.

This new option changes the current behavior so that gtid is always passed along in the VStream request, if it is available. This makes for a safer retry. However, it does mean that it is possible for the connector to received duplicates of the same row.

How it will look in Airbyte UI after upgrading to a version with this change.

image

@maxenglander maxenglander force-pushed the maxeng-source-opt-read-duration branch from b1c28a4 to d55dac4 Compare April 5, 2025 20:20
Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander maxenglander force-pushed the maxeng-source-opt-read-duration branch from d55dac4 to dae88cc Compare April 5, 2025 20:38
Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander maxenglander requested a review from Copilot April 5, 2025 20:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • cmd/airbyte-source/spec.json: Language not supported

Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander maxenglander requested a review from Copilot April 5, 2025 21:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • cmd/airbyte-source/spec.json: Language not supported

table := s.Stream
readDuration := 5 * time.Minute
timeout := 5 * time.Minute
if timeoutSeconds := ps.TimeoutSeconds; timeoutSeconds != nil {
Copy link

Copilot AI Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation does not enforce the minimum timeout of 5 minutes as required. Consider adding a condition to set the timeout to 5 minutes if the provided value is less than that.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no thanks, it's useful to be able to use values less than 5m in testing

@maxenglander maxenglander marked this pull request as ready for review April 5, 2025 22:05
Currently when airbyte-source syncs it wipes out `gtid` information from
the shard gtid before passing it along in the VStream request. This
means that, upon retry, it's possible for the connector to lose rows
that are inserted between the `gtid` of the last `vgtid` event it
received in one attempt and the `@@gtid_executed` of the next sync
attempt.

This new option changes the current behavior so that `gtid` is always
passed along in the VStream request, if it is available. This makes for
a safer retry. However, it does mean that it is possible for the
connector to received duplicates of the same row.

How it will look in Airbyte UI after upgrading to a version with this
change.

<img width="849" alt="image"
src="https://github.com/user-attachments/assets/5d699e82-3e37-4459-a833-365f717cea15"
/>

Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander maxenglander changed the title add option for specifying timeout in minutes add option for specifying timeout in minutes and gtid position together with last known pk Apr 7, 2025
@maxenglander maxenglander merged commit 53e32d8 into main Apr 7, 2025
3 checks passed
@maxenglander maxenglander deleted the maxeng-source-opt-read-duration branch April 7, 2025 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants