A meeting transcription app that runs entirely offline. It converts speech to text, separates speakers (diarization), and generates summaries of discussions. You can also configure custom prompts and connect it to your local LLM server. Use it for meetings, interviews, lectures, or any audio where you need a clear transcript and summary.
The sample-files/ directory contains example outputs to showcase MeetMemo's capabilities:
GenAI Week - AI x SaaS.wav- Sample audio file for testing (36MB, ~1 hour)GenAI_Week_-_AI_x_SaaS_Transcript_2026-01-04_23-47.pdf- Professional PDF transcript with speaker labelsGenAI_Week_-_AI_x_SaaS_Transcript_2026-01-04_23-47.md- Markdown transcript exportGenAI_Week_-_AI_x_SaaS_2026-01-04_23-46.md- AI-generated summary with key insights and action itemsMeetMemo-DEMO_v2.0.0.gif- Application demonstration GIF showcasing new UI
genai-week---ai-x-saas_transcript_2025-08-31_v1.0.0.json- Complete diarized transcript with speaker identificationgenai-week---ai-x-saas_summary_2025-08-31_v1.0.0.pdf- Professional PDF summarygenai-week---ai-x-saas_summary_2025-08-31_v1.0.0.markdown- Markdown version of summaryMeetMemo_Demo_v1.0.0.gif- Legacy demonstration GIF
These files demonstrate the complete workflow from audio upload to final deliverables, showing the quality and format of MeetMemo's output.
- Audio Recording & Upload: Record meetings directly in the browser or upload existing audio files (MP3, WAV, M4A, FLAC, etc.)
- Advanced Speech Recognition: Powered by OpenAI's Whisper for high-accuracy transcription in 99+ languages
- Speaker Diarization: Automatically identify and label different speakers using PyAnnote.audio v3.1
- AI-Powered Summarization: Generate concise summaries with key points, action items, and next steps using custom LLMs
- Real-time Processing: Monitor transcription progress with live status updates and job management
- Speaker Management: Edit and customize speaker names with persistent storage across sessions
- Export Options: Download transcripts as TXT/PDF and summaries with professional formatting
- HTTPS Support: Secure SSL setup with auto-generated certificates for production deployment
- Dark/Light Mode: Toggle between themes for comfortable viewing with responsive design
- Multi-language Support: Automatic language detection or specify target language for better accuracy
MeetMemo is a containerized application with three main services:
- Backend: FastAPI server with Whisper, PyAnnote, and LLM integration
- Frontend: React 19 application with modern UI components
- Nginx: Reverse proxy with SSL termination and request routing
- Docker: Install Docker
- Docker Compose: Install Docker Compose
- NVIDIA GPU: Required for optimal performance (CUDA-compatible)
- RAM: Minimum 8GB recommended (16GB+ for large files)
- Storage: At least 10GB free space for models and audio files
- Hugging Face Account: Required for PyAnnote model access
- LLM API: External LLM service for summarization (OpenAI, Anthropic, etc.)
-
Clone the repository:
git clone https://github.com/notyusheng/MeetMemo.git cd MeetMemo -
Accept Hugging Face model licenses:
Visit these pages and accept the licenses (fill in any required fields):
-
Create Hugging Face access token:
- Go to Hugging Face tokens page
- Click "New token", choose
Readscope, and copy the token
-
Set up environment file:
cp example.env .env
Edit
.envand update the required variables:HF_TOKEN=your_huggingface_token_here LLM_API_URL=your_llm_url_here LLM_MODEL_NAME=your_llm_model_name_here LLM_API_KEY=your_llm_api_key_here TIMEZONE_OFFSET=+8
-
Build and run:
docker compose build docker compose up
-
Access the application:
Open your browser and navigate to
https://localhost
- Upload/Record: Upload an audio file or record directly in the browser
- Transcribe: Click "Start Transcription" to begin processing
- Review: View the diarized transcript with speaker labels
- Customize: Edit speaker names for better identification
- Summarize: Generate AI-powered summaries with key insights
- Export: Download transcripts and summaries for future reference
- Speaker Management: Click speaker labels to rename them with persistent storage
- Custom Prompts: Use custom prompts for tailored summarization (technical analysis, action items only, etc.)
- Job Management: Track multiple transcription jobs with unique UUIDs and status monitoring
- Export Options: Multiple format support (TXT, PDF) for transcripts and summaries
- Batch Processing: Handle multiple audio files simultaneously
- Language Selection: Choose specific Whisper models for target languages (.en for English-only)
- Quality Control: Automatic audio preprocessing (mono conversion, 16kHz resampling)
- Progress Tracking: Real-time status updates with detailed processing logs
cd frontend
npm install
npm start # Start development server
npm run build # Build for production
npm test # Run testscd backend
pip install -r requirements.txt
python main.py # Run FastAPI server directlydocker compose build # Build containers
docker compose up -d # Run in detached mode
docker compose logs -f meetmemo-backend # View backend logs
docker compose logs -f meetmemo-frontend # View frontend logs
docker compose down # Stop all services
docker compose restart meetmemo-backend # Restart specific service# Frontend
cd frontend
npm run lint:css # Lint CSS files
npm run lint:css:fix # Fix CSS linting issues
# Backend
cd backend
ruff check # Check Python code style
ruff format # Format Python code- GPU not detected: Verify NVIDIA Docker runtime is installed
- Model download fails: Check Hugging Face token and license acceptance
- Audio upload issues: Ensure supported file format (WAV recommended)
- PyTorch Lightning warning: If you see checkpoint upgrade warnings, run the suggested upgrade command in the container
- Faster processing: Use smaller Whisper models (base, small)
- Higher accuracy: Use larger models (medium, large) with quality audio input
- GPU optimization: Ensure NVIDIA drivers and Docker GPU support are properly configured
- Memory management: Restart backend service after processing large files to free memory
- Audio quality: Use high-quality audio input (16kHz+) for better diarization results
MeetMemo provides a comprehensive REST API for programmatic access:
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
Health check and API status |
| POST | /upload |
Upload audio file for transcription |
| GET | /jobs |
List all transcription jobs |
| DELETE | /jobs/{uuid} |
Delete specific job |
| GET | /jobs/{uuid}/status |
Get job processing status |
| GET | /jobs/{uuid}/transcript |
Get diarized transcript |
| POST | /jobs/{uuid}/summarise |
Generate AI summary |
| PATCH | /jobs/{uuid}/speakers |
Update speaker names |
| GET | /logs |
View application logs |
- Real-time job status updates
- Progress notifications for long-running tasks
- Error handling and retry mechanisms
| Variable | Description | Default/Example |
|---|---|---|
HF_TOKEN |
Hugging Face API token for model access | Required |
LLM_API_URL |
External LLM service endpoint | http://localhost:8000/v1/chat/completions |
LLM_MODEL_NAME |
LLM model identifier | qwen2.5-32b-instruct |
LLM_API_KEY |
Authentication key for LLM service | Optional |
TIMEZONE_OFFSET |
Timezone offset in hours for timestamps | +8 (GMT+8/Singapore) |
MeetMemo uses a configurable timezone for all timestamps in exported PDFs and Markdown files. The timezone is set via the TIMEZONE_OFFSET environment variable in your .env file.
Default: Singapore Time (GMT+8)
To change the timezone:
- Edit your
.envfile - Update the
TIMEZONE_OFFSETvalue with your desired offset:- UTC:
TIMEZONE_OFFSET=0 - EST (GMT-5):
TIMEZONE_OFFSET=-5 - JST (GMT+9):
TIMEZONE_OFFSET=+9 - GMT+8 (Singapore/Default):
TIMEZONE_OFFSET=+8
- UTC:
- Restart the backend service:
docker compose restart meetmemo-backend
All export functions (PDF and Markdown) will use this configured timezone when generating timestamps.
Available models (size/speed trade-off):
tiny- Fastest, least accurate (~1GB VRAM)base- Good balance (~1GB VRAM)small- Better accuracy (~2GB VRAM)medium- High accuracy (~5GB VRAM)large- Best accuracy (~10GB VRAM)turbo- Latest optimized model (default)
| Volume | Purpose | Location |
|---|---|---|
audiofiles/ |
Uploaded audio files | /app/audiofiles |
transcripts/ |
Generated transcriptions | /app/transcripts |
summary/ |
AI-generated summaries | /app/summary |
logs/ |
Application logs | /app/logs |
whisper_cache/ |
Model cache | /app/whisper_cache |
- Local Processing: All audio transcription and diarization happens locally
- Data Privacy: Audio files never leave your infrastructure except for LLM summarization
- Secure Storage: Files stored in Docker volumes with proper permissions
- HTTPS Support: SSL certificates auto-generated for secure connections
- No Authentication: Designed for local deployment - add authentication layer for production
- API Security: CORS configured for frontend-backend communication
- File Validation: Audio file type and size validation on upload
HTTPS is required for the recording feature to work (browsers require secure context for microphone access). For local development on localhost, HTTP works fine. For production deployments, choose one of the options below.
For local testing, HTTP works on localhost:
docker compose up --buildAccess at http://localhost - recording will work because browsers allow microphone access on localhost.
Free HTTPS with zero certificate management:
# Install cloudflared
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o cloudflared
chmod +x cloudflared
sudo mv cloudflared /usr/local/bin/
# Authenticate and create tunnel
cloudflared tunnel login
cloudflared tunnel create meetmemo
# Configure tunnel (~/.cloudflared/config.yml)
tunnel: <your-tunnel-id>
credentials-file: /home/user/.cloudflared/<tunnel-id>.json
ingress:
- hostname: meetmemo.yourdomain.com
service: http://localhost:80
- service: http_status:404
# Run tunnel
cloudflared tunnel run meetmemoPerfect for internal/team use:
# Install Tailscale
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# Enable HTTPS serving
tailscale serve https / proxy http://127.0.0.1:80Access via https://<machine-name>.tail-scale.ts.net with automatic HTTPS.
Simple production deployment with automatic Let's Encrypt certificates:
# Install Caddy
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update && sudo apt install caddy
# Create Caddyfile
sudo nano /etc/caddy/CaddyfileAdd to Caddyfile:
meetmemo.yourdomain.com {
reverse_proxy localhost:80
}
sudo systemctl restart caddyTraditional setup for existing nginx infrastructure:
# Install Certbot
sudo apt install certbot python3-certbot-nginx
# Get certificate
sudo certbot --nginx -d meetmemo.yourdomain.com
# Update docker-compose.yml
# Change frontend port to avoid conflict
ports:
- "8080:80"Certbot automatically configures nginx and handles certificate renewal.
# Generate certificate
mkdir -p nginx/ssl
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
-keyout nginx/ssl/nginx-selfsigned.key \
-out nginx/ssl/nginx-selfsigned.crt \
-subj "/C=US/ST=State/L=City/O=MeetMemo/CN=localhost"
# Update nginx config and docker-compose to use HTTPS- SSL Certificates: Use proper SSL certificates in production (not self-signed)
- Authentication: Add authentication layer for multi-user deployments
- Resource Limits: Configure appropriate memory and CPU limits for containers
- Monitoring: Set up logging and monitoring for production workloads
- Backup: Regular backup of transcription data and speaker mappings
- Firewall: Configure firewall rules appropriately based on your HTTPS option
- Ensure cloud instance has GPU support for optimal performance
- Configure persistent volumes for data retention
- Set up load balancing if scaling horizontally
Before deploying to production:
- Choose and configure HTTPS option above
- Set strong
POSTGRES_PASSWORDin.env - Configure
HF_TOKENfor speaker diarization - Set up
LLM_API_URLandLLM_MODEL_NAME - Test recording feature works with HTTPS
- Set up backup for PostgreSQL data volume
- Configure firewall rules as needed
- Set up log rotation for application logs
This project is licensed under the MIT License.
