Skip to content

Conversation

@isabelle-cedar
Copy link
Contributor

No description provided.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR implements comprehensive voice streaming functionality for the Cedar OS product roadmap example. The implementation enables end-to-end voice interaction: audio recording → transcription → LLM processing → text-to-speech synthesis → streaming audio response.

Key Changes:

  • Backend Integration: Added @mastra/voice-openai dependency and created voice stream handler with SSE-based real-time communication
  • Frontend Voice State: Enhanced voice slice with streaming support, proper error handling, and audio playback capabilities
  • Provider Updates: Extended Mastra provider with voice streaming endpoints and event parsing
  • Workflow Enhancement: Modified chat workflow to accumulate text for voice synthesis instead of streaming individual chunks
  • Configuration: Updated provider config to include voice routing and enabled streaming voice settings

Technical Implementation:

  • Uses WebRTC for audio capture, OpenAI Whisper for transcription, and OpenAI TTS for synthesis
  • Implements proper stream handling with both Node.js Readable and Web ReadableStream compatibility
  • Includes comprehensive error handling and resource cleanup
  • Supports both streaming and non-streaming voice modes

Confidence Score: 3/5

  • This PR is moderately safe to merge with some implementation concerns that should be addressed
  • The implementation is architecturally sound with proper separation of concerns, but has several technical issues: missing environment variable validation could cause runtime errors, hardcoded audio format assumptions, and fragile stream type detection logic that could fail with certain stream implementations
  • Pay special attention to voiceStreamHandler.ts for environment variable validation and streamUtils.ts for stream compatibility detection

Important Files Changed

File Analysis

Filename        Score        Overview
examples-backend/product-roadmap-backend/src/mastra/voiceStreamHandler.ts 3/5 New voice streaming handler with transcription and LLM integration. Has some potential null handling issues and hardcoded transcription format.
examples-backend/product-roadmap-backend/src/utils/streamUtils.ts 3/5 Enhanced streaming utilities with voice support. Buffer handling logic may have edge cases with stream compatibility detection.
examples-backend/product-roadmap-backend/src/mastra/workflows/chatWorkflow.ts 4/5 Updated workflow to support voice mode with text accumulation for TTS synthesis. Clean integration of voice handling logic.
packages/cedar-os/src/store/voice/voiceSlice.ts 4/5 Comprehensive voice state management with streaming support. Well-structured with proper error handling and resource cleanup.
packages/cedar-os/src/store/agentConnection/providers/mastra.ts 4/5 Enhanced Mastra provider with voice streaming capabilities. Robust event parsing and proper stream handling.

Sequence Diagram

sequenceDiagram
    participant User
    participant CedarOS as Cedar OS (Frontend)
    participant MastraProvider as Mastra Provider
    participant VoiceHandler as Voice Stream Handler
    participant OpenAIVoice as @mastra/voice-openai
    participant ChatWorkflow as Chat Workflow
    participant LLM as OpenAI LLM

    User->>CedarOS: Record audio and submit
    CedarOS->>MastraProvider: voiceStreamLLM(audioData, settings)
    MastraProvider->>VoiceHandler: POST /voice/stream
    
    VoiceHandler->>OpenAIVoice: listen(audioBuffer, {filetype: 'webm'})
    OpenAIVoice->>VoiceHandler: transcription text
    VoiceHandler->>CedarOS: SSE: {type: 'transcription', transcription}
    
    VoiceHandler->>ChatWorkflow: start workflow with transcription
    ChatWorkflow->>LLM: streamVNext(transcription + context)
    
    loop Text streaming chunks
        LLM->>ChatWorkflow: text-delta chunks
        ChatWorkflow->>ChatWorkflow: accumulate pendingText (for voice mode)
    end
    
    ChatWorkflow->>OpenAIVoice: speak(pendingText)
    OpenAIVoice->>ChatWorkflow: audio stream
    ChatWorkflow->>VoiceHandler: audio data
    VoiceHandler->>CedarOS: SSE: {type: 'audio', audioData, content}
    
    VoiceHandler->>CedarOS: SSE: {type: 'done'}
    CedarOS->>User: Play audio response & show text
Loading

12 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +85 to +87
const transcription = await voiceProvider.listen(Readable.from(buf), {
filetype: 'webm',
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: hardcoded filetype: 'webm' assumes input format - consider making dynamic

Suggested change
const transcription = await voiceProvider.listen(Readable.from(buf), {
filetype: 'webm',
});
const transcription = await voiceProvider.listen(Readable.from(buf), {
filetype: audioFile.type.includes('webm') ? 'webm' : 'wav',
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples-backend/product-roadmap-backend/src/mastra/voiceStreamHandler.ts
Line: 85:87

Comment:
style: hardcoded `filetype: 'webm'` assumes input format - consider making dynamic

```suggestion
		const transcription = await voiceProvider.listen(Readable.from(buf), {
			filetype: audioFile.type.includes('webm') ? 'webm' : 'wav',
		});
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +118 to +119
if (typeof (speechStream as ReadableStream).getReader === 'function') {
// Web ReadableStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: stream type detection is fragile - checking for getReader method may miss other ReadableStream-like objects

Suggested change
if (typeof (speechStream as ReadableStream).getReader === 'function') {
// Web ReadableStream
if ('getReader' in speechStream && typeof speechStream.getReader === 'function') {
Prompt To Fix With AI
This is a comment left during a code review.
Path: examples-backend/product-roadmap-backend/src/utils/streamUtils.ts
Line: 118:119

Comment:
logic: stream type detection is fragile - checking for `getReader` method may miss other ReadableStream-like objects

```suggestion
	if ('getReader' in speechStream && typeof speechStream.getReader === 'function') {
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +8 to +12
speechModel: { apiKey: process.env.OPENAI_API_KEY!, name: 'tts-1' },
listeningModel: {
apiKey: process.env.OPENAI_API_KEY!,
name: 'whisper-1',
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: missing environment variable validation will cause runtime errors if API key is not set

Prompt To Fix With AI
This is a comment left during a code review.
Path: examples-backend/product-roadmap-backend/src/mastra/voiceStreamHandler.ts
Line: 8:12

Comment:
logic: missing environment variable validation will cause runtime errors if API key is not set

How can I resolve this? If you propose a fix, please make it concise.

@isabelle-cedar isabelle-cedar merged commit a12d6e7 into main Sep 27, 2025
4 of 6 checks passed
@isabelle-cedar isabelle-cedar deleted the feat/voice_streaming branch September 27, 2025 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants