-
Notifications
You must be signed in to change notification settings - Fork 17
Voice Streaming #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Voice Streaming #183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR implements comprehensive voice streaming functionality for the Cedar OS product roadmap example. The implementation enables end-to-end voice interaction: audio recording → transcription → LLM processing → text-to-speech synthesis → streaming audio response.
Key Changes:
- Backend Integration: Added
@mastra/voice-openaidependency and created voice stream handler with SSE-based real-time communication - Frontend Voice State: Enhanced voice slice with streaming support, proper error handling, and audio playback capabilities
- Provider Updates: Extended Mastra provider with voice streaming endpoints and event parsing
- Workflow Enhancement: Modified chat workflow to accumulate text for voice synthesis instead of streaming individual chunks
- Configuration: Updated provider config to include voice routing and enabled streaming voice settings
Technical Implementation:
- Uses WebRTC for audio capture, OpenAI Whisper for transcription, and OpenAI TTS for synthesis
- Implements proper stream handling with both Node.js Readable and Web ReadableStream compatibility
- Includes comprehensive error handling and resource cleanup
- Supports both streaming and non-streaming voice modes
Confidence Score: 3/5
- This PR is moderately safe to merge with some implementation concerns that should be addressed
- The implementation is architecturally sound with proper separation of concerns, but has several technical issues: missing environment variable validation could cause runtime errors, hardcoded audio format assumptions, and fragile stream type detection logic that could fail with certain stream implementations
- Pay special attention to voiceStreamHandler.ts for environment variable validation and streamUtils.ts for stream compatibility detection
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| examples-backend/product-roadmap-backend/src/mastra/voiceStreamHandler.ts | 3/5 | New voice streaming handler with transcription and LLM integration. Has some potential null handling issues and hardcoded transcription format. |
| examples-backend/product-roadmap-backend/src/utils/streamUtils.ts | 3/5 | Enhanced streaming utilities with voice support. Buffer handling logic may have edge cases with stream compatibility detection. |
| examples-backend/product-roadmap-backend/src/mastra/workflows/chatWorkflow.ts | 4/5 | Updated workflow to support voice mode with text accumulation for TTS synthesis. Clean integration of voice handling logic. |
| packages/cedar-os/src/store/voice/voiceSlice.ts | 4/5 | Comprehensive voice state management with streaming support. Well-structured with proper error handling and resource cleanup. |
| packages/cedar-os/src/store/agentConnection/providers/mastra.ts | 4/5 | Enhanced Mastra provider with voice streaming capabilities. Robust event parsing and proper stream handling. |
Sequence Diagram
sequenceDiagram
participant User
participant CedarOS as Cedar OS (Frontend)
participant MastraProvider as Mastra Provider
participant VoiceHandler as Voice Stream Handler
participant OpenAIVoice as @mastra/voice-openai
participant ChatWorkflow as Chat Workflow
participant LLM as OpenAI LLM
User->>CedarOS: Record audio and submit
CedarOS->>MastraProvider: voiceStreamLLM(audioData, settings)
MastraProvider->>VoiceHandler: POST /voice/stream
VoiceHandler->>OpenAIVoice: listen(audioBuffer, {filetype: 'webm'})
OpenAIVoice->>VoiceHandler: transcription text
VoiceHandler->>CedarOS: SSE: {type: 'transcription', transcription}
VoiceHandler->>ChatWorkflow: start workflow with transcription
ChatWorkflow->>LLM: streamVNext(transcription + context)
loop Text streaming chunks
LLM->>ChatWorkflow: text-delta chunks
ChatWorkflow->>ChatWorkflow: accumulate pendingText (for voice mode)
end
ChatWorkflow->>OpenAIVoice: speak(pendingText)
OpenAIVoice->>ChatWorkflow: audio stream
ChatWorkflow->>VoiceHandler: audio data
VoiceHandler->>CedarOS: SSE: {type: 'audio', audioData, content}
VoiceHandler->>CedarOS: SSE: {type: 'done'}
CedarOS->>User: Play audio response & show text
12 files reviewed, 3 comments
| const transcription = await voiceProvider.listen(Readable.from(buf), { | ||
| filetype: 'webm', | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: hardcoded filetype: 'webm' assumes input format - consider making dynamic
| const transcription = await voiceProvider.listen(Readable.from(buf), { | |
| filetype: 'webm', | |
| }); | |
| const transcription = await voiceProvider.listen(Readable.from(buf), { | |
| filetype: audioFile.type.includes('webm') ? 'webm' : 'wav', | |
| }); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples-backend/product-roadmap-backend/src/mastra/voiceStreamHandler.ts
Line: 85:87
Comment:
style: hardcoded `filetype: 'webm'` assumes input format - consider making dynamic
```suggestion
const transcription = await voiceProvider.listen(Readable.from(buf), {
filetype: audioFile.type.includes('webm') ? 'webm' : 'wav',
});
```
How can I resolve this? If you propose a fix, please make it concise.| if (typeof (speechStream as ReadableStream).getReader === 'function') { | ||
| // Web ReadableStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: stream type detection is fragile - checking for getReader method may miss other ReadableStream-like objects
| if (typeof (speechStream as ReadableStream).getReader === 'function') { | |
| // Web ReadableStream | |
| if ('getReader' in speechStream && typeof speechStream.getReader === 'function') { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples-backend/product-roadmap-backend/src/utils/streamUtils.ts
Line: 118:119
Comment:
logic: stream type detection is fragile - checking for `getReader` method may miss other ReadableStream-like objects
```suggestion
if ('getReader' in speechStream && typeof speechStream.getReader === 'function') {
```
How can I resolve this? If you propose a fix, please make it concise.| speechModel: { apiKey: process.env.OPENAI_API_KEY!, name: 'tts-1' }, | ||
| listeningModel: { | ||
| apiKey: process.env.OPENAI_API_KEY!, | ||
| name: 'whisper-1', | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: missing environment variable validation will cause runtime errors if API key is not set
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples-backend/product-roadmap-backend/src/mastra/voiceStreamHandler.ts
Line: 8:12
Comment:
logic: missing environment variable validation will cause runtime errors if API key is not set
How can I resolve this? If you propose a fix, please make it concise.
No description provided.