open-webui · Classic298 · Dec 29, 2025 · Dec 29, 2025
diff --git a/docs/features/audio/speech-to-text/env-variables.md b/docs/features/audio/speech-to-text/env-variables.md
@@ -11,20 +11,119 @@ For a complete list of all Open WebUI environment variables, see the [Environmen
 
 :::
 
-The following is a summary of the environment variables for speech to text (STT).
-
-# Environment Variables For Speech To Text (STT)
-
-| Variable | Description |
-|----------|-------------|
-| `WHISPER_MODEL` | Sets the Whisper model to use for local Speech-to-Text |
-| `WHISPER_MODEL_DIR` | Specifies the directory to store Whisper model files |
-| `WHISPER_COMPUTE_TYPE` | Sets the compute type for Whisper model inference (e.g., `int8`, `float16`) |
-| `WHISPER_LANGUAGE` | Specifies the ISO 639-1 (ISO 639-2 for Hawaiian and Cantonese) Speech-to-Text language to use for Whisper (language is predicted unless set) |
-| `AUDIO_STT_ENGINE` | Specifies the Speech-to-Text engine to use (empty for local Whisper, or `openai`) |
-| `AUDIO_STT_MODEL` | Specifies the Speech-to-Text model for OpenAI-compatible endpoints |
-| `AUDIO_STT_OPENAI_API_BASE_URL` | Sets the OpenAI-compatible base URL for Speech-to-Text |
-| `AUDIO_STT_OPENAI_API_KEY` | Sets the OpenAI API key for Speech-to-Text |
-| `AUDIO_STT_AZURE_API_KEY` | Sets the Azure API key for Speech-to-Text |
-| `AUDIO_STT_AZURE_REGION` | Sets the Azure region for Speech-to-Text |
-| `AUDIO_STT_AZURE_LOCALES` | Sets the Azure locales for Speech-to-Text |
+The following is a summary of the environment variables for speech to text (STT) and text to speech (TTS).
+
+:::tip UI Configuration
+Most of these settings can also be configured in the **Admin Panel → Settings → Audio** tab. Environment variables take precedence on startup but can be overridden in the UI.
+:::
+
+## Speech To Text (STT) Environment Variables
+
+### Local Whisper
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `WHISPER_MODEL` | Whisper model size | `base` |
+| `WHISPER_MODEL_DIR` | Directory to store Whisper model files | `{CACHE_DIR}/whisper/models` |
+| `WHISPER_COMPUTE_TYPE` | Compute type for inference (see note below) | `int8` |
+| `WHISPER_LANGUAGE` | ISO 639-1 language code (empty = auto-detect) | empty |
+| `WHISPER_MODEL_AUTO_UPDATE` | Auto-download model updates | `false` |
+| `WHISPER_VAD_FILTER` | Enable Voice Activity Detection filter | `false` |
+
+:::info WHISPER_COMPUTE_TYPE Options
+- `int8` — CPU default, fastest but may not work on older GPUs
+- `float16` — **Recommended for CUDA/GPU**
+- `int8_float16` — Hybrid mode (int8 weights, float16 computation)
+- `float32` — Maximum compatibility, slowest
+
+If using the `:cuda` Docker image with an older GPU, set `WHISPER_COMPUTE_TYPE=float16` to avoid errors.
+:::
+
+### OpenAI-Compatible STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_ENGINE` | STT engine: empty (local Whisper), `openai`, `azure`, `deepgram`, `mistral` | empty |
+| `AUDIO_STT_MODEL` | STT model for external providers | empty |
+| `AUDIO_STT_OPENAI_API_BASE_URL` | OpenAI-compatible API base URL | `https://api.openai.com/v1` |
+| `AUDIO_STT_OPENAI_API_KEY` | OpenAI API key | empty |
+| `AUDIO_STT_SUPPORTED_CONTENT_TYPES` | Comma-separated list of supported audio MIME types | empty |
+
+### Azure STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_AZURE_API_KEY` | Azure Cognitive Services API key | empty |
+| `AUDIO_STT_AZURE_REGION` | Azure region | `eastus` |
+| `AUDIO_STT_AZURE_LOCALES` | Comma-separated locales (e.g., `en-US,de-DE`) | auto |
+| `AUDIO_STT_AZURE_BASE_URL` | Custom Azure base URL (optional) | empty |
+| `AUDIO_STT_AZURE_MAX_SPEAKERS` | Max speakers for diarization | `3` |
+
+### Deepgram STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `DEEPGRAM_API_KEY` | Deepgram API key | empty |
+
+### Mistral STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_MISTRAL_API_KEY` | Mistral API key | empty |
+| `AUDIO_STT_MISTRAL_API_BASE_URL` | Mistral API base URL | `https://api.mistral.ai/v1` |
+| `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS` | Use chat completions endpoint | `false` |
+
+## Text To Speech (TTS) Environment Variables
+
+### General TTS
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_TTS_ENGINE` | TTS engine: empty (disabled), `openai`, `elevenlabs`, `azure`, `transformers` | empty |
+| `AUDIO_TTS_MODEL` | TTS model | `tts-1` |
+| `AUDIO_TTS_VOICE` | Default voice | `alloy` |
+| `AUDIO_TTS_SPLIT_ON` | Split text on: `punctuation` or `none` | `punctuation` |
+| `AUDIO_TTS_API_KEY` | API key for ElevenLabs or Azure TTS | empty |
+
+### OpenAI-Compatible TTS
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_TTS_OPENAI_API_BASE_URL` | OpenAI-compatible TTS API base URL | `https://api.openai.com/v1` |
+| `AUDIO_TTS_OPENAI_API_KEY` | OpenAI TTS API key | empty |
+| `AUDIO_TTS_OPENAI_PARAMS` | Additional JSON params for OpenAI TTS | empty |
+
+### Azure TTS
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_TTS_AZURE_SPEECH_REGION` | Azure Speech region | `eastus` |
+| `AUDIO_TTS_AZURE_SPEECH_BASE_URL` | Custom Azure Speech base URL (optional) | empty |
+| `AUDIO_TTS_AZURE_SPEECH_OUTPUT_FORMAT` | Audio output format | `audio-24khz-160kbitrate-mono-mp3` |
+
+## Tips for Configuring Audio
+
+### Using Local Whisper STT
+
+For GPU acceleration issues or older GPUs, try setting:
+```yaml
+environment:
+  - WHISPER_COMPUTE_TYPE=float16
+```
+
+### Using External TTS Services
+
+When running Open WebUI in Docker with an external TTS service:
+
+```yaml
+environment:
+  - AUDIO_TTS_ENGINE=openai
+  - AUDIO_TTS_OPENAI_API_BASE_URL=http://host.docker.internal:5050/v1
+  - AUDIO_TTS_OPENAI_API_KEY=your-api-key
+```
+
+:::tip
+Use `host.docker.internal` on Docker Desktop (Windows/Mac) to access services on the host. On Linux, use the host IP or container networking.
+:::
+
+For troubleshooting audio issues, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
diff --git a/docs/features/audio/speech-to-text/mistral-voxtral-integration.md b/docs/features/audio/speech-to-text/mistral-voxtral-integration.md
@@ -0,0 +1,125 @@
+---
+sidebar_position: 2
+title: "Mistral Voxtral STT"
+---
+
+# Using Mistral Voxtral for Speech-to-Text
+
+This guide covers how to use Mistral's Voxtral model for Speech-to-Text with Open WebUI. Voxtral is Mistral's speech-to-text model that provides accurate transcription.
+
+## Requirements
+
+- A Mistral API key
+- Open WebUI installed and running
+
+## Quick Setup (UI)
+
+1. Click your **profile icon** (bottom-left corner)
+2. Select **Admin Panel**
+3. Click **Settings** → **Audio** tab
+4. Configure the following:
+
+| Setting | Value |
+|---------|-------|
+| **Speech-to-Text Engine** | `MistralAI` |
+| **API Key** | Your Mistral API key |
+| **STT Model** | `voxtral-mini-latest` (or leave empty for default) |
+
+5. Click **Save**
+
+## Available Models
+
+| Model | Description |
+|-------|-------------|
+| `voxtral-mini-latest` | Default transcription model (recommended) |
+
+## Environment Variables Setup
+
+If you prefer to configure via environment variables:
+
+```yaml
+services:
+  open-webui:
+    image: ghcr.io/open-webui/open-webui:main
+    environment:
+      - AUDIO_STT_ENGINE=mistral
+      - AUDIO_STT_MISTRAL_API_KEY=your-mistral-api-key
+      - AUDIO_STT_MODEL=voxtral-mini-latest
+    # ... other configuration
+```
+
+### All Mistral STT Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_ENGINE` | Set to `mistral` | empty (uses local Whisper) |
+| `AUDIO_STT_MISTRAL_API_KEY` | Your Mistral API key | empty |
+| `AUDIO_STT_MISTRAL_API_BASE_URL` | Mistral API base URL | `https://api.mistral.ai/v1` |
+| `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS` | Use chat completions endpoint | `false` |
+| `AUDIO_STT_MODEL` | STT model | `voxtral-mini-latest` |
+
+## Transcription Methods
+
+Mistral supports two transcription methods:
+
+### Standard Transcription (Default)
+Uses the dedicated transcription endpoint. This is the recommended method.
+
+### Chat Completions Method
+Set `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS=true` to use Mistral's chat completions API for transcription. This method:
+- Requires audio in mp3 or wav format (automatic conversion is attempted)
+- May provide different results than the standard endpoint
+
+## Using STT
+
+1. Click the **microphone icon** in the chat input
+2. Speak your message
+3. Click the microphone again or wait for silence detection
+4. Your speech will be transcribed and appear in the input box
+
+## Supported Audio Formats
+
+Voxtral accepts common audio formats. The system defaults to accepting `audio/*` and `video/webm`.
+
+If using the chat completions method, audio is automatically converted to mp3.
+
+## Troubleshooting
+
+### API Key Errors
+
+If you see "Mistral API key is required":
+1. Verify your API key is entered correctly
+2. Check the API key hasn't expired
+3. Ensure your Mistral account has API access
+
+### Transcription Not Working
+
+1. Check container logs: `docker logs open-webui -f`
+2. Verify the STT Engine is set to `MistralAI`
+3. Try the standard transcription method (disable chat completions)
+
+### Audio Format Issues
+
+If using chat completions method and audio conversion fails:
+- Ensure FFmpeg is available in the container
+- Try recording in a different format (wav or mp3)
+- Switch to the standard transcription method
+
+For more troubleshooting, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
+
+## Comparison with Other STT Options
+
+| Feature | Mistral Voxtral | OpenAI Whisper | Local Whisper |
+|---------|-----------------|----------------|---------------|
+| **Cost** | Per-minute pricing | Per-minute pricing | Free |
+| **Privacy** | Audio sent to Mistral | Audio sent to OpenAI | Audio stays local |
+| **Model Options** | voxtral-mini-latest | whisper-1 | tiny → large |
+| **GPU Required** | No | No | Recommended |
+
+## Cost Considerations
+
+Mistral charges per minute of audio for STT. Check [Mistral's pricing page](https://mistral.ai/products/la-plateforme#pricing) for current rates.
+
+:::tip
+For free STT, use **Local Whisper** (the default) or the browser's **Web API** for basic transcription.
+:::
diff --git a/docs/features/audio/speech-to-text/openai-stt-integration.md b/docs/features/audio/speech-to-text/openai-stt-integration.md
@@ -0,0 +1,136 @@
+---
+sidebar_position: 0
+title: "OpenAI STT Integration"
+---
+
+# Using OpenAI for Speech-to-Text
+
+This guide covers how to use OpenAI's Whisper API for Speech-to-Text with Open WebUI. This provides cloud-based transcription without needing local GPU resources.
+
+:::tip Looking for TTS?
+See the companion guide: [Using OpenAI for Text-to-Speech](/features/audio/text-to-speech/openai-tts-integration)
+:::
+
+## Requirements
+
+- An OpenAI API key with access to the Audio API
+- Open WebUI installed and running
+
+## Quick Setup (UI)
+
+1. Click your **profile icon** (bottom-left corner)
+2. Select **Admin Panel**
+3. Click **Settings** → **Audio** tab
+4. Configure the following:
+
+| Setting | Value |
+|---------|-------|
+| **Speech-to-Text Engine** | `OpenAI` |
+| **API Base URL** | `https://api.openai.com/v1` |
+| **API Key** | Your OpenAI API key |
+| **STT Model** | `whisper-1` |
+| **Supported Content Types** | Leave empty for defaults, or set `audio/wav,audio/mpeg,audio/webm` |
+
+5. Click **Save**
+
+## Available Models
+
+| Model | Description |
+|-------|-------------|
+| `whisper-1` | OpenAI's Whisper large-v2 model, hosted in the cloud |
+
+:::info
+OpenAI currently only offers `whisper-1`. For more model options, use Local Whisper (built into Open WebUI) or other providers like Deepgram.
+:::
+
+## Environment Variables Setup
+
+If you prefer to configure via environment variables:
+
+```yaml
+services:
+  open-webui:
+    image: ghcr.io/open-webui/open-webui:main
+    environment:
+      - AUDIO_STT_ENGINE=openai
+      - AUDIO_STT_OPENAI_API_BASE_URL=https://api.openai.com/v1
+      - AUDIO_STT_OPENAI_API_KEY=sk-...
+      - AUDIO_STT_MODEL=whisper-1
+    # ... other configuration
+```
+
+### All STT Environment Variables (OpenAI)
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_ENGINE` | Set to `openai` | empty (uses local Whisper) |
+| `AUDIO_STT_OPENAI_API_BASE_URL` | OpenAI API base URL | `https://api.openai.com/v1` |
+| `AUDIO_STT_OPENAI_API_KEY` | Your OpenAI API key | empty |
+| `AUDIO_STT_MODEL` | STT model | `whisper-1` |
+| `AUDIO_STT_SUPPORTED_CONTENT_TYPES` | Allowed audio MIME types | `audio/*,video/webm` |
+
+### Supported Audio Formats
+
+By default, Open WebUI accepts `audio/*` and `video/webm` for transcription. If you need to restrict or expand supported formats, set `AUDIO_STT_SUPPORTED_CONTENT_TYPES`:
+
+```yaml
+environment:
+  - AUDIO_STT_SUPPORTED_CONTENT_TYPES=audio/wav,audio/mpeg,audio/webm
+```
+
+OpenAI's Whisper API supports: `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`
+
+## Using STT
+
+1. Click the **microphone icon** in the chat input
+2. Speak your message
+3. Click the microphone again or wait for silence detection
+4. Your speech will be transcribed and appear in the input box
+
+## OpenAI vs Local Whisper
+
+| Feature | OpenAI Whisper API | Local Whisper |
+|---------|-------------------|---------------|
+| **Latency** | Network dependent | Faster for short clips |
+| **Cost** | Per-minute pricing | Free (uses your hardware) |
+| **Privacy** | Audio sent to OpenAI | Audio stays local |
+| **GPU Required** | No | Recommended for speed |
+| **Model Options** | `whisper-1` only | tiny, base, small, medium, large |
+
+Choose **OpenAI** if:
+- You don't have a GPU
+- You want consistent performance
+- Privacy isn't a concern
+
+Choose **Local Whisper** if:
+- You want free transcription
+- You need audio to stay private
+- You have a GPU for acceleration
+
+## Troubleshooting
+
+### Microphone Not Working
+
+1. Ensure you're using HTTPS or localhost
+2. Check browser microphone permissions
+3. See [Microphone Access Issues](/troubleshooting/audio#microphone-access-issues)
+
+### Transcription Errors
+
+1. Check your OpenAI API key is valid
+2. Verify the API Base URL is correct
+3. Check container logs for error messages
+
+### Language Issues
+
+OpenAI's Whisper API automatically detects language. If you need to force a specific language, consider using Local Whisper with the `WHISPER_LANGUAGE` environment variable.
+
+For more troubleshooting, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
+
+## Cost Considerations
+
+OpenAI charges per minute of audio for STT. See [OpenAI Pricing](https://platform.openai.com/docs/pricing) for current rates.
+
+:::tip
+For free STT, use **Local Whisper** (the default) or the browser's **Web API** for basic transcription.
+:::