Speechmatics Python SDK provides convenient access to enterprise-grade speech-to-text APIs from Python applications.
Fully typed with type definitions for all request params and response fields. Modern Python with async/await patterns, type hints, and context managers for production-ready code.
55+ Languages β’ Realtime & Batch β’ Custom vocabularies β’ Speaker diarization β’ Speaker ID
Get API Key β’ Documentation β’ Academy Examples
- Quickstart
- Why Speechmatics?
- Key Features
- Use Cases
- Documentation
- Authentication
- Advanced Configuration
- Deployment Options
- Community & Support
# Choose the package for your use case:
# Batch transcription
pip install speechmatics-batch
# Realtime streaming
pip install speechmatics-rt
# Voice agents
pip install speechmatics-voice
# Text-to-speech
pip install speechmatics-ttsπ¦ Package Details β’ Click to see what's included in each package
speechmatics-batch - Async batch transcription API
- Upload audio files for processing
- Get transcripts with timestamps, speakers, entities
- Supports all audio intelligence features
speechmatics-rt - Realtime WebSocket streaming
- Stream audio for live transcription
- Ultra-low latency (150ms p95)
- Partial and final transcripts
speechmatics-voice - Voice agent SDK
- Build conversational AI applications
- Speaker diarization and turn detection
- Optional ML-based smart turn:
pip install speechmatics-voice[smart]
speechmatics-tts - Text-to-speech
- Convert text to natural-sounding speech
- Multiple voices and languages
- Streaming and batch modes
git clone https://github.com/speechmatics/speechmatics-python-sdk.git
cd speechmatics-python-sdk
python -m venv .venv
.venv\Scripts\activate
# On Mac/Linux: source .venv/bin/activate
# Install development dependencies for all SDKs
make install-dev
# Install pre-commit hooks
pre-commit installSimple and Pythonic! Get your API key at portal.speechmatics.com
Note
All examples use load_dotenv() to load your API key from a .env file. Create a .env file with SPEECHMATICS_API_KEY=your_key_here.
There are several different methods of generating your first transcription:
- Batch Transcription - transcribe audio files
- Realtime Streaming - live microphone transcription
- Text-to-Speech - convert text to audio
- Voice Agent - real-time transcription with speaker diarization and turn detection
Transcribe audio files:
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient
load_dotenv()
async def main():
client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
result = await client.transcribe("audio.wav")
print(result.transcript_text)
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvLive microphone transcription:
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.rt import (
AsyncClient,
ServerMessageType,
TranscriptionConfig,
TranscriptResult,
AudioFormat,
AudioEncoding,
Microphone,
)
load_dotenv()
CHUNK_SIZE = 4096
async def main():
client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
mic = Microphone(sample_rate=16000, chunk_size=CHUNK_SIZE)
@client.on(ServerMessageType.ADD_TRANSCRIPT)
def on_final(message):
result = TranscriptResult.from_message(message)
if result.metadata.transcript:
print(f"[final]: {result.metadata.transcript}")
@client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT)
def on_partial(message):
result = TranscriptResult.from_message(message)
if result.metadata.transcript:
print(f"[partial]: {result.metadata.transcript}")
mic.start()
try:
await client.start_session(
transcription_config=TranscriptionConfig(language="en", enable_partials=True),
audio_format=AudioFormat(encoding=AudioEncoding.PCM_S16LE, sample_rate=16000),
)
print("Speak now...")
while True:
await client.send_audio(await mic.read(CHUNK_SIZE))
finally:
mic.stop()
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-rt python-dotenv pyaudioConvert text to audio:
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.tts import AsyncClient, Voice, OutputFormat
load_dotenv()
async def main():
client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
response = await client.generate(
text="Hello! Welcome to Speechmatics Text-to-Speech",
voice=Voice.SARAH,
output_format=OutputFormat.WAV_16000
)
audio_data = await response.read()
with open("output.wav", "wb") as f:
f.write(audio_data)
print("Audio saved to output.wav")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-tts python-dotenvReal-time transcription with speaker diarization and turn detection:
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, VoiceAgentConfigPreset, AgentServerMessageType
load_dotenv()
async def main():
client = VoiceAgentClient(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
config=VoiceAgentConfigPreset.load("adaptive")
)
@client.on(AgentServerMessageType.ADD_SEGMENT)
def on_segment(message):
for segment in message.get("segments", []):
print(f"[{segment.get('speaker_id', 'S1')}]: {segment.get('text', '')}")
@client.on(AgentServerMessageType.END_OF_TURN)
def on_turn_end(message):
print("[END OF TURN]")
mic = Microphone(sample_rate=16000, chunk_size=320)
mic.start()
try:
await client.connect()
print("Voice agent ready. Speak now...")
while True:
await client.send_audio(await mic.read(320))
finally:
mic.stop()
await client.disconnect()
asyncio.run(main())Installation:
pip install speechmatics-voice speechmatics-rt python-dotenv pyaudioTip
Ready for more? Explore 20+ working examples at Speechmatics Academy β voice agents, integrations, use cases, and migration guides.
- 99.9% Uptime SLA - Enterprise-grade reliability
- SOC 2 Type II Certified - Your data is secure
- Flexible Deployment - SaaS, on-premises, or air-gapped
When 1% WER improvement translates to millions in revenue, you need the best.
| Metric | Speechmatics | Deepgram |
|---|---|---|
| Word Error Rate (WER) | 6.8% | 16.5% |
| Languages Supported | 55+ | 30+ |
| Custom dictionary | 1,000 words | 100 words |
| Speaker diarization | Included | Extra charge |
| Realtime translation | 30+ languages | β |
| Sentiment analysis | β | β |
| On-premises | β | Limited |
| On-device | β | β |
| Air-gapped deployment | β | β |
Stream audio and get instant transcriptions with ultra-low latency. Perfect for voice agents, live captioning, and conversational AI.
Code example - Click to expand
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, VoiceAgentConfigPreset, AgentServerMessageType
load_dotenv()
async def main():
# Voice SDK with adaptive turn detection - optimised for conversational AI
client = VoiceAgentClient(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
config=VoiceAgentConfigPreset.load("adaptive")
)
# Handle transcription segments with speaker labels
@client.on(AgentServerMessageType.ADD_SEGMENT)
def on_segment(message):
for segment in message.get("segments", []):
print(f"[{segment.get('speaker_id', 'S1')}]: {segment.get('text', '')}")
# Detect when speaker finishes their turn
@client.on(AgentServerMessageType.END_OF_TURN)
def on_turn_end(message):
print("[END OF TURN]")
mic = Microphone(sample_rate=16000, chunk_size=320)
mic.start()
try:
await client.connect()
print("Voice agent ready. Speak now...")
while True:
await client.send_audio(await mic.read(320))
finally:
mic.stop()
await client.disconnect()
asyncio.run(main())Installation:
pip install speechmatics-voice speechmatics-rt python-dotenv pyaudioUpload audio files and get accurate transcripts with speaker labels, timestamps, and more.
Code example - Click to expand
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient, TranscriptionConfig, FormatType
load_dotenv()
async def main():
client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
# Submit job with advanced features
job = await client.submit_job(
"example.wav",
transcription_config=TranscriptionConfig(
language="en",
diarization="speaker",
enable_entities=True,
punctuation_overrides={
"permitted_marks": [".", "?", "!"]
}
)
)
# Wait for completion
result = await client.wait_for_completion(job.id, format_type=FormatType.JSON)
# Access results
print(f"Transcript: {result.transcript_text}")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvAutomatically detect and label different speakers in your audio.
Code example - Click to expand
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient, TranscriptionConfig
load_dotenv()
async def main():
client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
job = await client.submit_job(
"example.wav",
transcription_config=TranscriptionConfig(
language="en",
diarization="speaker",
speaker_diarization_config={
"prefer_current_speaker": True
}
)
)
result = await client.wait_for_completion(job.id)
# Access full transcript with speaker labels
print(f"Full transcript:\n{result.transcript_text}\n")
# Access individual results with speaker information
for result_item in result.results:
if result_item.alternatives:
alt = result_item.alternatives[0]
speaker = alt.speaker or "Unknown"
content = alt.content
print(f"Speaker {speaker}: {content}")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvAdd domain-specific terms, names, and acronyms for perfect accuracy.
Code example - Click to expand
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.rt import (
AsyncClient,
ServerMessageType,
TranscriptionConfig,
TranscriptResult,
AudioFormat,
AudioEncoding,
Microphone,
ConversationConfig,
)
load_dotenv()
async def main():
api_key = os.getenv("SPEECHMATICS_API_KEY")
if not api_key:
print("Error: SPEECHMATICS_API_KEY not set")
return
transcript_parts = []
audio_format = AudioFormat(
encoding=AudioEncoding.PCM_S16LE,
chunk_size=4096,
sample_rate=16000,
)
transcription_config = TranscriptionConfig(
language="en",
enable_partials=True,
additional_vocab=[
{"content": "Speechmatics", "sounds_like": ["speech mat ics"]},
{"content": "API", "sounds_like": ["A P I", "A. P. I."]},
{"content": "kubernetes", "sounds_like": ["koo ber net ees"]},
{"content": "Anthropic", "sounds_like": ["an throp ik", "an throw pick"]},
{"content": "OAuth", "sounds_like": ["oh auth", "O auth", "O. Auth"]},
{"content": "PostgreSQL", "sounds_like": ["post gres Q L", "post gres sequel"]},
{"content": "Nginx", "sounds_like": ["engine X", "N jinx"]},
{"content": "GraphQL", "sounds_like": ["graph Q L", "graph quel"]},
],
conversation_config=ConversationConfig(
end_of_utterance_silence_trigger=0.5, # seconds of silence to trigger end of utterance
),
)
mic = Microphone(sample_rate=16000, chunk_size=4096)
if not mic.start():
print("PyAudio not installed")
return
client = AsyncClient(api_key=api_key)
@client.on(ServerMessageType.ADD_TRANSCRIPT)
def on_final(message):
result = TranscriptResult.from_message(message)
if result.metadata.transcript:
print(f"[final]: {result.metadata.transcript}")
transcript_parts.append(result.metadata.transcript)
@client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT)
def on_partial(message):
result = TranscriptResult.from_message(message)
if result.metadata.transcript:
print(f"[partial]: {result.metadata.transcript}")
@client.on(ServerMessageType.END_OF_UTTERANCE)
def on_utterance_end(message):
print("[END OF UTTERANCE]\n")
try:
await client.start_session(
transcription_config=transcription_config,
audio_format=audio_format,
)
print("Speak now...")
while True:
await client.send_audio(await mic.read(4096))
except KeyboardInterrupt:
pass
finally:
mic.stop()
await client.close()
print(f"\nFull transcript: {' '.join(transcript_parts)}")
asyncio.run(main())Installation:
pip install speechmatics-rt python-dotenv pyaudioNative models for major languages, not just multilingual Whisper.
Code example - Click to expand
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient, TranscriptionConfig
load_dotenv()
async def main():
client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
# Automatic language detection
job = await client.submit_job(
"audio.wav",
transcription_config=TranscriptionConfig(language="auto")
)
result = await client.wait_for_completion(job.id)
print(f"Detected language transcript: {result.transcript_text}")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvGet sentiment, topics, summaries, and chapters from your audio.
Code example - Click to expand
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import (
AsyncClient,
JobConfig,
JobType,
TranscriptionConfig,
SentimentAnalysisConfig,
TopicDetectionConfig,
SummarizationConfig,
AutoChaptersConfig
)
load_dotenv()
async def main():
client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
# Configure job with all audio intelligence features
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(language="en"),
sentiment_analysis_config=SentimentAnalysisConfig(),
topic_detection_config=TopicDetectionConfig(),
summarization_config=SummarizationConfig(),
auto_chapters_config=AutoChaptersConfig()
)
job = await client.submit_job("example.wav", config=config)
result = await client.wait_for_completion(job.id)
# Access all results
print(f"Transcript: {result.transcript_text}")
if result.sentiment_analysis:
print(f"Sentiment: {result.sentiment_analysis}")
if result.topics:
print(f"Topics: {result.topics}")
if result.summary:
print(f"Summary: {result.summary}")
if result.chapters:
print(f"Chapters: {result.chapters}")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvTranscribe and translate simultaneously to multiple languages.
Code example - Click to expand
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import (
AsyncClient,
JobConfig,
JobType,
TranscriptionConfig,
TranslationConfig
)
load_dotenv()
async def main():
client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(language="en"),
translation_config=TranslationConfig(target_languages=["es", "fr", "de"])
)
job = await client.submit_job("sample.mp4", config=config)
result = await client.wait_for_completion(job.id)
# Access original transcript
print(f"Original (English): {result.transcript_text}\n")
# Access translations
if result.translations:
for lang_code, segments in result.translations.items():
translated_text = " ".join(seg.get("content", "") for seg in segments)
print(f"Translated ({lang_code}): {translated_text}")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvFor more integration examples including Django, Next.js, and production patterns, visit the Speechmatics Academy.
Build real-time voice assistants with LiveKit Agents - a framework for building voice AI applications with WebRTC.
from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent, RoomInputOptions
from livekit.plugins import openai, silero, speechmatics, elevenlabs
load_dotenv()
class VoiceAssistant(Agent):
"""Voice assistant agent with Speechmatics STT."""
def __init__(self) -> None:
super().__init__(instructions="You are a helpful voice assistant. Be concise and friendly.")
async def entrypoint(ctx: agents.JobContext):
"""
Main entrypoint for the voice assistant.
Pipeline: LiveKit Room - Speechmatics STT - OpenAI LLM - ElevenLabs TTS - LiveKit Room
"""
await ctx.connect()
# Speech to text: Speechmatics with speaker diarization
stt = speechmatics.STT(
language="en",
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
focus_speakers=["S1"],
)
# Language Model: OpenAI
llm = openai.LLM(model="gpt-4o-mini")
# Text-to-Speech: ElevenLabs
tts = elevenlabs.TTS(voice_id="21m00Tcm4TlvDq8ikWAM")
# Voice Activity Detection: Silero
vad = silero.VAD.load()
# Create and start Agent Session
session = AgentSession(stt=stt, llm=llm, tts=tts, vad=vad)
await session.start(
room=ctx.room,
agent=VoiceAssistant(),
room_input_options=RoomInputOptions(),
)
# Send initial greeting
await session.generate_reply(instructions="Say a short hello and ask how you can help.")
if __name__ == "__main__":
agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))Installation:
pip install livekit-agents livekit-plugins-speechmatics livekit-plugins-openai livekit-plugins-elevenlabs livekit-plugins-sileroKey Features:
- Realtime WebRTC audio streaming
- Speechmatics STT with speaker diarization
- Configurable LLM and TTS providers
- Voice Activity Detection (VAD)
Build Realtime voice bots with Pipecat - a framework for voice and multimodal conversational AI.
import asyncio
import os
from dotenv import load_dotenv
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.runner import PipelineRunner
from pipecat.pipeline.task import PipelineTask
from pipecat.services.openai.llm import OpenAILLMService, OpenAILLMContext
from pipecat.services.speechmatics.stt import SpeechmaticsSTTService, Language
from pipecat.services.speechmatics.tts import SpeechmaticsTTSService
from pipecat.transports.local.audio import LocalAudioTransport
load_dotenv()
async def main():
# Configure Speechmatics STT with speaker diarization
stt = SpeechmaticsSTTService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
params=SpeechmaticsSTTService.InputParams(
language=Language.EN,
speaker_active_format="@{speaker_id}: {text}"
)
)
# Configure Speechmatics TTS
tts = SpeechmaticsTTSService(
api_key=os.getenv("SPEECHMATICS_API_KEY"),
voice_id="sarah"
)
# Configure LLM (OpenAI, Anthropic, etc.)
llm = OpenAILLMService(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o"
)
# Set up conversation context
context = OpenAILLMContext([
{"role": "system", "content": "You are a helpful AI assistant."}
])
context_aggregator = llm.create_context_aggregator(context)
# Build pipeline: Audio Input -> STT -> LLM -> TTS -> Audio Output
transport = LocalAudioTransport()
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
context_aggregator.assistant(),
])
# Run the voice bot
runner = PipelineRunner()
task = PipelineTask(pipeline)
print("Voice bot ready! Speak into your microphone...")
await runner.run(task)
asyncio.run(main())Installation:
pip install pipecat-ai[speechmatics, openai] pyaudioKey Features:
- Real-time STT with speaker diarization
- Natural-sounding TTS with multiple voices
- Interruption handling (users can interrupt bot responses)
- Works with any LLM provider (OpenAI, Anthropic, etc.)
Each SDK package includes detailed documentation:
| Package | Documentation | Description |
|---|---|---|
| speechmatics-batch | README β’ Migration Guide | Async batch transcription |
| speechmatics-rt | README β’ Migration Guide | Realtime Streaming |
| speechmatics-voice | README | Voice agent SDK |
| speechmatics-tts | README | Text-to-speech |
Comprehensive collection of working examples, integrations, and templates: github.com/speechmatics/speechmatics-academy
| Example | Description | Package |
|---|---|---|
| Hello World | Simplest transcription example | Batch |
| Batch vs Realtime | Learn the difference between API modes | Batch, RT |
| Configuration Guide | Common configuration options | Batch |
| Audio Intelligence | Sentiment, topics, and summaries | Batch |
| Multilingual & Translation | 50+ languages and real-time translation | RT |
| Text-to-Speech | Convert text to natural-sounding speech | TTS |
| Turn Detection | Silence-based turn detection | RT |
| Voice Agent Turn Detection | Smart turn detection with presets | Voice |
| Speaker ID & Focus | Speaker identification and focus control | Voice |
| Channel Diarization | Multi-channel transcription | Voice, RT |
| Integration | Example | Features |
|---|---|---|
| LiveKit | Simple Voice Assistant | WebRTC, VAD, diarization, LLM, TTS |
| LiveKit | Telephony with Twilio | Phone calls via SIP, Krisp noise cancellation |
| Pipecat | Simple Voice Bot | Local audio, VAD, LLM, TTS |
| Pipecat | Voice Bot (Web) | Browser-based WebRTC |
| Twilio | Outbound Dialer | Media Streams, ElevenLabs TTS |
| VAPI | Voice Assistant | Voice AI platform integration |
| Industry | Example | Features |
|---|---|---|
| Healthcare | Medical Transcription | Realtime, custom medical vocabulary |
| Media | Video Captioning | SRT generation, batch processing |
| Contact Center | Call Analytics | Channel diarization, sentiment, topics |
| Business | AI Receptionist | LiveKit, Twilio SIP, Google Calendar |
| Seasonal | Santa Voice Agent | LiveKit, Twilio SIP, ElevenLabs TTS, custom voice |
| From | Guide | Status |
|---|---|---|
| Deepgram | Migration Guide | Available |
- API Reference - Complete API documentation
- SDK Repository - Python SDK source code
- Developer Portal - Get your API key
The legacy speechmatics-python package has been deprecated. This new SDK offers:
- Cleaner API - More Pythonic, better type hints
- More features - Sentiment, translation, summarization
- Better docs - Comprehensive examples and guides
speechmatics-python:
from speechmatics.models import BatchTranscriptionConfig
from speechmatics.batch_client import BatchClient
with BatchClient("API_KEY") as client:
job_id = client.submit_job("audio.wav", BatchTranscriptionConfig("en"))
transcript = client.wait_for_completion(job_id, transcription_format='txt')
print(transcript)speechmatics-python-sdk:
import asyncio
from speechmatics.batch import AsyncClient, TranscriptionConfig, FormatType
async def main():
client = AsyncClient(api_key="API_KEY")
job = await client.submit_job(
"audio.wav",
transcription_config=TranscriptionConfig(language="en")
)
result = await client.wait_for_completion(job.id, format_type=FormatType.TXT)
print(result)
await client.close()
asyncio.run(main())Full Migration Guides: Batch Migration Guide β’ Realtime Migration Guide
HIPAA-compliant transcription for clinical notes, patient interviews, and telemedicine.
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient, TranscriptionConfig
load_dotenv()
async def main():
api_key = os.getenv("SPEECHMATICS_API_KEY")
client = AsyncClient(api_key=api_key)
# Use medical domain for better accuracy with clinical terminology
job = await client.submit_job(
"patient_interview.wav",
transcription_config=TranscriptionConfig(
language="en",
domain="medical",
additional_vocab=[
{"content": "hypertension"},
{"content": "metformin"},
{"content": "echocardiogram"},
{"content": "MRI", "sounds_like": ["M R I"]},
{"content": "CT scan", "sounds_like": ["C T scan"]}
]
)
)
result = await client.wait_for_completion(job.id)
print(f"Transcript:\n{result.transcript_text}")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvBuild Alexa-like experiences with real-time transcription and speaker detection.
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.rt import Microphone
from speechmatics.voice import (
VoiceAgentClient,
VoiceAgentConfigPreset,
AgentServerMessageType,
)
load_dotenv()
async def main():
api_key = os.getenv("SPEECHMATICS_API_KEY")
# Load a preset configuration (options: adaptive, scribe, captions, external, fast)
config = VoiceAgentConfigPreset.load("adaptive")
# Initialize microphone
mic = Microphone(sample_rate=16000, chunk_size=320)
if not mic.start():
print("PyAudio not available - install with: pip install pyaudio")
return
# Create voice agent client
client = VoiceAgentClient(api_key=api_key, config=config)
@client.on(AgentServerMessageType.ADD_SEGMENT)
def on_segment(message):
for segment in message.get("segments", []):
speaker_id = segment.get("speaker_id", "S1")
text = segment.get("text", "")
print(f"[{speaker_id}]: {text}")
@client.on(AgentServerMessageType.END_OF_TURN)
def on_turn_end(message):
print("[END OF TURN]")
try:
await client.connect()
print("Voice agent started. Speak into your microphone (Ctrl+C to stop)...")
while True:
audio_chunk = await mic.read(320)
await client.send_audio(audio_chunk)
except KeyboardInterrupt:
print("\nStopping...")
finally:
mic.stop()
await client.disconnect()
asyncio.run(main())Installation:
pip install speechmatics-voice speechmatics-rt python-dotenv pyaudioπ More Use Cases β’ Click to explore Call Center, Healthcare, Media & Entertainment, Education, and Meetings examples
Transcribe calls with speaker diarization, sentiment analysis, and topic detection.
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import (
AsyncClient,
JobConfig,
JobType,
TranscriptionConfig,
SummarizationConfig,
SentimentAnalysisConfig,
TopicDetectionConfig
)
load_dotenv()
async def main():
api_key = os.getenv("SPEECHMATICS_API_KEY")
client = AsyncClient(api_key=api_key)
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(
language="en",
diarization="speaker"
),
sentiment_analysis_config=SentimentAnalysisConfig(),
topic_detection_config=TopicDetectionConfig(),
summarization_config=SummarizationConfig(
content_type="conversational",
summary_length="brief"
)
)
job = await client.submit_job("call_recording.wav", config=config)
result = await client.wait_for_completion(job.id)
print(f"Transcript:\n{result.transcript_text}\n")
if result.sentiment_analysis:
segments = result.sentiment_analysis.get("segments", [])
counts = {"positive": 0, "negative": 0, "neutral": 0}
for seg in segments:
sentiment = seg.get("sentiment", "").lower()
if sentiment in counts:
counts[sentiment] += 1
overall = max(counts, key=counts.get)
print(f"Sentiment: {overall.capitalize()}")
print(f"Breakdown: {counts['positive']} positive, {counts['neutral']} neutral, {counts['negative']} negative")
if result.topics and 'summary' in result.topics:
overall = result.topics['summary']['overall']
topics = [topic for topic, count in overall.items() if count > 0]
print(f"Topics: {', '.join(topics)}")
if result.summary:
print(f"Summary: {result.summary.get('content')}")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvAdd captions, create searchable archives, generate clips from keywords.
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient, TranscriptionConfig, FormatType
load_dotenv()
async def main():
api_key = os.getenv("SPEECHMATICS_API_KEY")
client = AsyncClient(api_key=api_key)
job = await client.submit_job(
"movie.mp4",
transcription_config=TranscriptionConfig(language="en")
)
# Get SRT captions
captions = await client.wait_for_completion(job.id, format_type=FormatType.SRT)
# Save captions
with open("movie.srt", "w", encoding="utf-8") as f:
f.write(captions)
print("Captions saved to movie.srt")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvAuto-generate lecture transcripts, searchable course content, and accessibility captions.
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient, TranscriptionConfig, FormatType
load_dotenv()
async def main():
api_key = os.getenv("SPEECHMATICS_API_KEY")
client = AsyncClient(api_key=api_key)
job = await client.submit_job(
"lecture_recording.wav",
transcription_config=TranscriptionConfig(
language="en",
diarization="speaker",
enable_entities=True
)
)
result = await client.wait_for_completion(job.id)
# Save transcript
with open("lecture_transcript.txt", "w", encoding="utf-8") as f:
f.write(result.transcript_text)
# Save SRT captions for accessibility
captions = await client.wait_for_completion(job.id, format_type=FormatType.SRT)
with open("lecture_captions.srt", "w", encoding="utf-8") as f:
f.write(captions)
print("Transcript and captions saved")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvTurn meetings into searchable, actionable summaries with action items and key decisions.
import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import (
AsyncClient,
JobConfig,
JobType,
TranscriptionConfig,
SummarizationConfig,
AutoChaptersConfig
)
load_dotenv()
async def main():
api_key = os.getenv("SPEECHMATICS_API_KEY")
client = AsyncClient(api_key=api_key)
config = JobConfig(
type=JobType.TRANSCRIPTION,
transcription_config=TranscriptionConfig(
language="en",
diarization="speaker"
),
summarization_config=SummarizationConfig(),
auto_chapters_config=AutoChaptersConfig()
)
job = await client.submit_job("board_meeting.mp4", config=config)
result = await client.wait_for_completion(job.id)
print(f"Transcript:\n{result.transcript_text}\n")
if result.summary:
summary = result.summary.get('content', 'N/A')
print(f"Summary:\n{summary}\n")
if result.chapters:
print("Chapters:")
for i, chapter in enumerate(result.chapters, 1):
print(f"{i}. {chapter}")
await client.close()
asyncio.run(main())Installation:
pip install speechmatics-batch python-dotenvsequenceDiagram
participant App as Your App
participant SM as Speechmatics RT
App->>SM: Connect WebSocket (WSS)
App->>SM: StartRecognition (config, audio format)
SM->>App: RecognitionStarted
loop Stream Audio
App->>SM: Audio Chunks (binary)
SM->>App: AudioAdded (ack)
SM->>App: AddPartialTranscript (JSON)
SM->>App: AddTranscript (JSON, final)
end
App->>SM: EndOfStream
SM->>App: EndOfTranscript
sequenceDiagram
participant App as Your App
participant API as Batch API
participant Queue as Job Queue
participant Engine as Transcription Engine
App->>API: POST /jobs (upload audio)
API->>Queue: Enqueue job
API->>App: Return job_id
Queue->>Engine: Process audio
Engine->>Queue: Store results
loop Poll Status
App->>API: GET /jobs/{id}
API->>App: Status: running/done
end
App->>API: GET /jobs/{id}/transcript
API->>App: Return transcript (JSON/TXT/SRT)
Caution
Security Best Practice: Never hardcode API keys in your source code. Always use environment variables or secure secret management systems.
export SPEECHMATICS_API_KEY="your_api_key_here"import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient
load_dotenv()
async def main():
client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
# Use client here
# ...
await client.close()
asyncio.run(main())Warning
Browser Security: For browser-based transcription, always use temporary JWT tokens to avoid exposing your long-lived API key. Pass the token as a query parameter: wss://eu2.rt.speechmatics.com/v2?jwt=<token>
import asyncio
from speechmatics.batch import AsyncClient, JWTAuth
async def main():
# Generate temporary token (expires after ttl seconds)
auth = JWTAuth(api_key="your_api_key", ttl=3600)
client = AsyncClient(auth=auth)
# Use client here
# ...
await client.close()
asyncio.run(main())import asyncio
from speechmatics.rt import AsyncClient, ConnectionConfig
async def main():
# Configure WebSocket connection parameters
conn_config = ConnectionConfig(
ping_timeout=60.0, # Timeout waiting for pong response (seconds)
ping_interval=20.0, # Interval for WebSocket ping frames (seconds)
open_timeout=30.0, # Timeout for establishing connection (seconds)
close_timeout=10.0 # Timeout for closing connection (seconds)
)
client = AsyncClient(
api_key="KEY",
url="wss://eu2.rt.speechmatics.com/v2",
conn_config=conn_config
)
# Use client here
# ...
await client.close()
asyncio.run(main())import asyncio
from speechmatics.batch import AsyncClient, TranscriptionConfig
from speechmatics.batch import BatchError, JobError, AuthenticationError
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
async def transcribe_with_retry(audio_file):
client = AsyncClient(api_key="YOUR_API_KEY")
try:
job = await client.submit_job(
audio_file,
transcription_config=TranscriptionConfig(language="en")
)
return await client.wait_for_completion(job.id)
except AuthenticationError:
print("Authentication failed")
raise
except (BatchError, JobError) as e:
print(f"Transcription failed: {e}")
raise
finally:
await client.close()
asyncio.run(transcribe_with_retry("audio.wav"))import asyncio
from speechmatics.batch import AsyncClient, ConnectionConfig
async def main():
# Configure HTTP connection settings for batch API
conn_config = ConnectionConfig(
connect_timeout=30.0, # Timeout for connection establishment
operation_timeout=300.0 # Default timeout for API operations
)
client = AsyncClient(
api_key="KEY",
conn_config=conn_config
)
# Use client here
# ...
await client.close()
asyncio.run(main())Zero infrastructure - just sign up and start transcribing.
import asyncio
from speechmatics.batch import AsyncClient
async def main():
client = AsyncClient(api_key="YOUR_API_KEY")
# Uses global SaaS endpoints automatically
await client.close()
asyncio.run(main())Run Speechmatics on your own hardware.
docker pull speechmatics/transcription-engine:latest
docker run -p 9000:9000 speechmatics/transcription-engineimport asyncio
from speechmatics.batch import AsyncClient
async def main():
client = AsyncClient(
api_key="YOUR_LICENSE_KEY",
url="http://localhost:9000/v2"
)
# Use on-premises instance
await client.close()
asyncio.run(main())Scale transcription with k8s orchestration.
# Install the sm-realtime chart
helm upgrade --install speechmatics-realtime \
oci://speechmaticspublic.azurecr.io/sm-charts/sm-realtime \
--version 0.7.0 \
--set proxy.ingress.url="speechmatics.example.com"The 5-Minute Test: Can you install, authenticate, and run a successful transcription in under 5 minutes?
# 1. Install (30 seconds)
pip install speechmatics-batch python-dotenv
# 2. Set API key (30 seconds)
export SPEECHMATICS_API_KEY="your_key_here"
# 3. Run test (4 minutes)
python3 << 'EOF'
import asyncio
import os
from speechmatics.batch import AsyncClient, TranscriptionConfig, AuthenticationError
from dotenv import load_dotenv
load_dotenv()
async def test():
api_key = os.getenv("SPEECHMATICS_API_KEY")
# Replace with your audio file path
audio_file = "your_audio_file.wav"
client = AsyncClient(api_key=api_key)
try:
print("Submitting transcription job...")
job = await client.submit_job(audio_file, transcription_config=TranscriptionConfig(language="en"))
print(f"Job submitted: {job.id}")
print("Waiting for completion...")
result = await client.wait_for_completion(job.id)
print(f"\nTranscript: {result.transcript_text}")
print("\nTest completed successfully!")
except AuthenticationError as e:
print(f"\nAuthentication Error: {e}")
finally:
await client.close()
asyncio.run(test())
EOFIf this fails, open an issue - we prioritize developer experience.
- GitHub Discussions: Ask questions, share projects
- Stack Overflow: Tag with
speechmatics - Email Support: devrel@speechmatics.com
- Status Page: status.speechmatics.com
Share what you built:
- Tweet with @Speechmatics
- Post in Show & Tell
This project is licensed under the MIT License - see the LICENSE file for details.
- Website: speechmatics.com
- Documentation: docs.speechmatics.com
- Portal: portal.speechmatics.com
- Status Page: status.speechmatics.com
- Blog: speechmatics.com/blog
- GitHub: @speechmatics