-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fix(sync): send audio-bytes webhook during the offline sync #3930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(sync): send audio-bytes webhook during the offline sync #3930
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully adds the functionality to trigger webhooks for offline audio sync. The new utility functions are well-defined. However, there is a significant performance issue: the asynchronous webhook function uses a synchronous HTTP library (requests), which will block the event loop. My review includes a comment with details on how to fix this by using an asynchronous library like httpx, which was conveniently added in this PR.
d569556 to
511fce6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the offline audio sync workflow by ensuring that developer webhooks are triggered with audio bytes even for files uploaded and processed offline. The changes correctly replace the synchronous requests library with httpx for asynchronous webhook calls. However, a critical performance issue is identified due to performing blocking file I/O within an async endpoint, which will block the server's event loop. Suggestions have been provided to refactor this using asyncio.to_thread to prevent performance degradation, aligning with best practices for asynchronous applications. Additionally, a suggestion to narrow a broad exception catch has been included to improve error handling.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the offline audio sync workflow by triggering webhooks with audio bytes. The implementation is solid, correctly using asyncio.to_thread for blocking I/O and httpx for asynchronous network requests. I've provided one suggestion in backend/routers/sync.py to centralize error handling, which will improve code clarity and maintainability. After addressing this, the PR should be ready to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully adds the functionality to send audio bytes via webhook for offline-synced files. It correctly refactors a synchronous network call to be asynchronous using httpx, which is a crucial improvement. My main feedback is to enhance performance by parallelizing the file reading operations. The current implementation reads files one by one, which can be slow. By processing them concurrently, the overall sync time can be significantly reduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a valuable feature to send audio-bytes webhooks for offline-synced files. The implementation is well-structured, using asyncio for concurrent processing. My review includes a few suggestions to improve performance and memory usage. Specifically, I recommend changing the webhook data type from bytearray to bytes to avoid unnecessary data copying, and running the webhook triggers as a background task to prevent blocking the main API response. These changes will make the new feature more efficient and robust.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances the offline audio sync workflow by ensuring that developer webhooks are triggered with audio bytes for offline-processed files. It correctly introduces a timestamp parameter to provide better context for these webhooks. A key improvement is the switch to httpx for asynchronous HTTP requests in send_audio_bytes_developer_webhook, which prevents blocking the event loop and aligns with best practices for async applications. The documentation has also been clearly updated. I have one high-severity suggestion to prevent a potential runtime error.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request successfully enhances the offline audio sync workflow by triggering developer webhooks with audio bytes for offline-processed files. The addition of a timestamp query parameter is a valuable enhancement for providing accurate recording times. The implementation correctly leverages asyncio for concurrent processing of audio files and non-blocking webhook notifications. A significant improvement is the migration from the synchronous requests library to the asynchronous httpx for sending webhooks, which is crucial for the performance of an async application. My review includes a suggestion to extend this asynchronous pattern to another webhook function within the same file to ensure consistency and further improve performance.
| async def send_audio_bytes_developer_webhook( | ||
| uid: str, sample_rate: int, data: bytes, timestamp: int | None = None | ||
| ): | ||
| print("send_audio_bytes_developer_webhook", uid) | ||
| # TODO: add a lock, send shorter segments, validate regex. | ||
| toggled = user_webhook_status_db(uid, WebhookType.audio_bytes) | ||
| if toggled: | ||
| webhook_url = get_user_webhook_db(uid, WebhookType.audio_bytes) | ||
| webhook_url = webhook_url.split(',')[0] | ||
| if not webhook_url: | ||
| return | ||
| webhook_url += f'?sample_rate={sample_rate}&uid={uid}' | ||
| try: | ||
| response = requests.post( | ||
| webhook_url, data=data, headers={'Content-Type': 'application/octet-stream'}, timeout=15 | ||
| ) | ||
| print('send_audio_bytes_developer_webhook:', webhook_url, response.status_code) | ||
| except Exception as e: | ||
| print(f"Error sending audio bytes to developer webhook: {e}") | ||
| else: | ||
| if not toggled: | ||
| return | ||
|
|
||
| webhook_url = get_user_webhook_db(uid, WebhookType.audio_bytes) | ||
| webhook_url = (webhook_url or '').split(',')[0] | ||
| if not webhook_url: | ||
| return | ||
|
|
||
| webhook_url += f'?sample_rate={sample_rate}&uid={uid}' | ||
| if timestamp is not None: | ||
| webhook_url += f'×tamp={timestamp}' | ||
| try: | ||
| async with httpx.AsyncClient() as client: | ||
| response = await client.post( | ||
| webhook_url, content=data, headers={'Content-Type': 'application/octet-stream'}, timeout=15 | ||
| ) | ||
| print('send_audio_bytes_developer_webhook:', webhook_url, response.status_code) | ||
| except httpx.RequestError as e: | ||
| print(f"Error sending audio bytes to developer webhook: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great improvement, changing requests to httpx to avoid blocking the event loop in this async function. To maintain consistency and improve performance across the module, consider applying the same pattern to the realtime_transcript_webhook function in this file, which currently uses a synchronous requests.post call and can block the event loop.
Here's how you could refactor realtime_transcript_webhook to use httpx:
async def realtime_transcript_webhook(uid, segments: List[dict]):
print("realtime_transcript_webhook", uid)
toggled = user_webhook_status_db(uid, WebhookType.realtime_transcript)
if toggled:
webhook_url = get_user_webhook_db(uid, WebhookType.realtime_transcript)
if not webhook_url:
return
webhook_url += f'?uid={uid}'
try:
async with httpx.AsyncClient() as client:
response = await client.post(
webhook_url,
json={'segments': segments, 'session_id': uid},
headers={'Content-Type': 'application/json'},
timeout=15,
)
print('realtime_transcript_webhook:', webhook_url, response.status_code)
if response.status_code == 200:
response_data = response.json()
if not response_data:
return
message = response_data.get('message', '')
if len(message) > 5:
send_webhook_notification(uid, message)
except httpx.RequestError as e:
print(f"Error sending realtime transcript to developer webhook: {e}")
else:
returnReferences
- In an async application, use asynchronous HTTP clients like
httpxinstead of synchronous ones likerequeststo avoid blocking the event loop and degrading performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can be for another pull request honnestly
This pull request enhances the offline audio sync workflow by ensuring that developer webhooks are triggered with audio bytes even for files uploaded and processed offline.
I also add a new query params to send the
timestampof the file to provide the right "when is recorded" to webhooksIt introduces utility functions to read WAV files as PCM data and schedules asynchronous webhook notifications for each synced audio file.
Resolve #3846