GitHub - speechmatics/speechmatics-python-sdk: Python SDKs for Speechmatics APIs

Speechmatics Python SDK provides convenient access to enterprise-grade speech-to-text APIs from Python applications.

Fully typed with type definitions for all request params and response fields. Modern Python with async/await patterns, type hints, and context managers for production-ready code.

55+ Languages • Realtime & Batch • Custom vocabularies • Speaker diarization • Speaker ID

Get API Key • Documentation • Academy Examples

⚡ Quick Start

Installation

# Choose the package for your use case:

# Batch transcription
pip install speechmatics-batch

# Realtime streaming
pip install speechmatics-rt

# Voice agents
pip install speechmatics-voice

# Text-to-speech
pip install speechmatics-tts

📦 Package Details • Click to see what's included in each package

speechmatics-batch - Async batch transcription API

Upload audio files for processing
Get transcripts with timestamps, speakers, entities
Supports all audio intelligence features

speechmatics-rt - Realtime WebSocket streaming

Stream audio for live transcription
Ultra-low latency (150ms p95)
Partial and final transcripts

speechmatics-voice - Voice agent SDK

Build conversational AI applications
Speaker diarization and turn detection
Optional ML-based smart turn: pip install speechmatics-voice[smart]

speechmatics-tts - Text-to-speech

Convert text to natural-sounding speech
Multiple voices and languages
Streaming and batch modes

Setting Up Development Environment

git clone https://github.com/speechmatics/speechmatics-python-sdk.git
cd speechmatics-python-sdk

python -m venv .venv
.venv\Scripts\activate
# On Mac/Linux: source .venv/bin/activate

# Install development dependencies for all SDKs
make install-dev

# Install pre-commit hooks
pre-commit install

Simple and Pythonic! Get your API key at portal.speechmatics.com

Your First Transcription

Note

All examples use load_dotenv() to load your API key from a .env file. Create a .env file with SPEECHMATICS_API_KEY=your_key_here.

There are several different methods of generating your first transcription:

Batch Transcription - transcribe audio files
Realtime Streaming - live microphone transcription
Text-to-Speech - convert text to audio
Voice Agent - real-time transcription with speaker diarization and turn detection

Batch Transcription

Transcribe audio files:

import asyncio
import os
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient

load_dotenv()

async def main():
    client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
    result = await client.transcribe("audio.wav")
    print(result.transcript_text)
    await client.close()

asyncio.run(main())

Installation:

pip install speechmatics-batch python-dotenv

Realtime Streaming

Live microphone transcription:

import asyncio
import os
from dotenv import load_dotenv
from speechmatics.rt import (
    AsyncClient,
    ServerMessageType,
    TranscriptionConfig,
    TranscriptResult,
    AudioFormat,
    AudioEncoding,
    Microphone,
)

load_dotenv()

CHUNK_SIZE = 4096

async def main():
    client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))
    mic = Microphone(sample_rate=16000, chunk_size=CHUNK_SIZE)

    @client.on(ServerMessageType.ADD_TRANSCRIPT)
    def on_final(message):
        result = TranscriptResult.from_message(message)
        if result.metadata.transcript:
            print(f"[final]: {result.metadata.transcript}")

    @client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT)
    def on_partial(message):
        result = TranscriptResult.from_message(message)
        if result.metadata.transcript:
            print(f"[partial]: {result.metadata.transcript}")

    mic.start()

    try:
        await client.start_session(
            transcription_config=TranscriptionConfig(language="en", enable_partials=True),
            audio_format=AudioFormat(encoding=AudioEncoding.PCM_S16LE, sample_rate=16000),
        )
        print("Speak now...")

        while True:
            await client.send_audio(await mic.read(CHUNK_SIZE))
    finally:
        mic.stop()
        await client.close()


asyncio.run(main())

Installation:

pip install speechmatics-rt python-dotenv pyaudio

Text-to-Speech

Convert text to audio:

import asyncio
import os
from dotenv import load_dotenv
from speechmatics.tts import AsyncClient, Voice, OutputFormat

load_dotenv()

async def main():
    client = AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY"))

    response = await client.generate(
        text="Hello! Welcome to Speechmatics Text-to-Speech",
        voice=Voice.SARAH,
        output_format=OutputFormat.WAV_16000
    )

    audio_data = await response.read()
    with open("output.wav", "wb") as f:
        f.write(audio_data)
        print("Audio saved to output.wav")

    await client.close()

asyncio.run(main())

Installation:

pip install speechmatics-tts python-dotenv

Voice Agent

Real-time transcription with speaker diarization and turn detection:

import asyncio
import os
from dotenv import load_dotenv
from speechmatics.rt import Microphone
from speechmatics.voice import VoiceAgentClient, VoiceAgentConfigPreset, AgentServerMessageType

load_dotenv()

async def main():
    client = VoiceAgentClient(
        api_key=os.getenv("SPEECHMATICS_API_KEY"),
        config=VoiceAgentConfigPreset.load("adaptive")
    )

    @client.on(AgentServerMessageType.ADD_SEGMENT)
    def on_segment(message):
        for segment in message.get("segments", []):
            print(f"[{segment.get('speaker_id', 'S1')}]: {segment.get('text', '')}")

    @client.on(AgentServerMessageType.END_OF_TURN)
    def on_turn_end(message):
        print("[END OF TURN]")

    mic = Microphone(sample_rate=16000, chunk_size=320)
    mic.start()

    try:
        await client.connect()
        print("Voice agent ready. Speak now...")

        while True:
            await client.send_audio(await mic.read(320))
    finally:
        mic.stop()
        await client.disconnect()

asyncio.run(main())

Installation:

pip install speechmatics-voice speechmatics-rt python-dotenv pyaudio

Tip

Ready for more? Explore 20+ working examples at Speechmatics Academy — voice agents, integrations, use cases, and migration guides.

🏆 Why Speechmatics?

Built for Production

99.9% Uptime SLA - Enterprise-grade reliability
SOC 2 Type II Certified - Your data is secure
Flexible Deployment - SaaS, on-premises, or air-gapped

Accuracy That Matters

When 1% WER improvement translates to millions in revenue, you need the best.

Metric	Speechmatics	Deepgram
Word Error Rate (WER)	6.8%	16.5%
Languages Supported	55+	30+
Custom dictionary	1,000 words	100 words
Speaker diarization	Included	Extra charge
Realtime translation	30+ languages	❌
Sentiment analysis	✅	❌
On-premises	✅	Limited
On-device	✅	❌
Air-gapped deployment	✅	❌

🚀 Key Features

Realtime Transcription

Stream audio and get instant transcriptions with ultra-low latency. Perfect for voice agents, live captioning, and conversational AI.