Transform text into American Sign Language (ASL) videos through an OpenAI-compatible API. GestureGPT combines LLM intelligence with ASL video generation to make sign language accessible through a simple REST interface.
Quick Start β’ Documentation β’ API Reference β’ Contributing
- Overview
- Features
- Architecture
- Technology Stack
- Quick Start
- Configuration
- API Reference
- Usage Examples
- Documentation
- Project Structure
- Development
- Deployment
- Troubleshooting
- Contributing
- License
GestureGPT is a FastAPI-based service that bridges the gap between text communication and American Sign Language. It provides an OpenAI-compatible API that responds with ASL video sequences instead of plain text, making sign language accessible through standard API calls.
User Text Input
β
LLM generates ASL-friendly text (OpenAI/Claude/Ollama/Local)
β
Text Normalizer converts to ASL grammar tokens
β
Video Repository looks up corresponding sign videos
β
Returns: { video_urls: [...], text_transcript, missing_videos: [...] }
- OpenAI-Compatible API: Drop-in replacement for OpenAI chat endpoints
- Multiple LLM Backends: Support for OpenAI, Claude, Ollama, LM Studio, and more
- Real-time Video Lookup: Retrieves pre-recorded ASL videos from SignASL.org
- Smart Caching: Local video URL caching for improved performance
- Interactive Demo: Streamlit-based web interface for testing
- Docker-First: Pre-built images available on GitHub Container Registry
|
|
βββββββββββββββββββ
β User/Client β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β GestureGPT API (FastAPI) β
β βββββββββββββββββββββββββββββββββ β
β β OpenAI-Compatible Endpoints β β
β βββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββ β
β β Direct Conversion Endpoint β β
β βββββββββββββββββββββββββββββββββ β
ββββββββ¬βββββββββββββββββββββ¬ββββββββββ
β β
βΌ βΌ
ββββββββββββββββ ββββββββββββββββββββ
β LLM Service β β Video Repository β
β β β (SignASL API) β
β - OpenAI β β β
β - Claude β β Local Cache β
β - Ollama β β (video_cache) β
β - vLLM β β β
ββββββββββββββββ ββββββββββββββββββββ
- User Input: Client sends text message via API
- LLM Processing: LLM generates ASL-friendly response
- Text Normalization: Convert to uppercase ASL tokens
- Video Lookup: Query SignASL API for video URLs (with caching)
- Response Assembly: Return video URLs + text transcript
- Client Playback: Client displays videos in sequence
See architecture.puml for the detailed workflow diagram.
| Component | Technology | Purpose |
|---|---|---|
| Backend Framework | FastAPI | High-performance async API server |
| LLM Integration | OpenAI SDK, Anthropic SDK | Multi-provider LLM support |
| Text Processing | NLTK | ASL grammar normalization |
| Video Source | SignASL.org API | Real ASL video repository |
| Caching | JSON file cache | Video URL caching |
| Demo Interface | Streamlit | Interactive web demo |
| Containerization | Docker, Docker Compose | Deployment and orchestration |
| CI/CD | GitHub Actions | Automated image builds |
| Documentation | Swagger UI, ReDoc | Auto-generated API docs |
Run both GestureGPT backend and SignASL API together:
# Clone the repository
git clone https://github.com/NotYuSheng/GestureGPT.git
cd GestureGPT
# Start all services (backend, SignASL API, demo)
docker compose up -d
# View logs
docker compose logs -f
# Services available at:
# - GestureGPT API: http://localhost:8000
# - API Docs: http://localhost:8000/docs
# - SignASL API: http://localhost:8001
# - Streamlit Demo: http://localhost:8501 (if using demo compose file)Test the API:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gesturegpt-v1",
"messages": [{"role": "user", "content": "Hello, how are you?"}],
"format": "mp4"
}'GestureGPT Backend:
docker pull ghcr.io/notyusheng/gesturegpt:latest
docker run -d -p 8000:8000 \
-e LLM_PROVIDER=openai \
-e OPENAI_API_KEY=your-key-here \
ghcr.io/notyusheng/gesturegpt:latest
# Access API docs at http://localhost:8000/docsSignASL API:
docker pull ghcr.io/notyusheng/signasl-api:latest
docker run -d -p 8001:8000 \
-v ./cache:/app/cache \
ghcr.io/notyusheng/signasl-api:latestSee Docker Quick Start Guide for detailed instructions.
# Clone repository
git clone https://github.com/NotYuSheng/GestureGPT.git
cd GestureGPT
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Copy and configure environment
cp .env.example .env
# Edit .env with your LLM provider settings
# Run the server
python -m app.main
# Or with hot reload:
# uvicorn app.main:app --reload
# Access API at http://localhost:8000GestureGPT supports multiple LLM backends. Configure in .env file:
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-3.5-turbo
OPENAI_BASE_URL=https://api.openai.com/v1LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=dummy
OPENAI_MODEL=llama3.2LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
OPENAI_MODEL=Qwen2.5-VL-7B-InstructLLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:1234/v1
OPENAI_API_KEY=lm-studio
OPENAI_MODEL=local-modelSee LLM Configuration Guide for detailed setup instructions.
| Variable | Description | Default | Required |
|---|---|---|---|
LLM_PROVIDER |
LLM backend (openai, anthropic, placeholder) |
placeholder |
No |
OPENAI_API_KEY |
OpenAI API key | - | If using OpenAI |
OPENAI_BASE_URL |
OpenAI-compatible endpoint | https://api.openai.com/v1 |
No |
OPENAI_MODEL |
Model name | gpt-3.5-turbo |
No |
ANTHROPIC_API_KEY |
Anthropic API key | - | If using Claude |
ANTHROPIC_MODEL |
Claude model name | claude-3-5-sonnet-20241022 |
No |
SIGNASL_API_URL |
SignASL API endpoint | http://localhost:8001 |
No |
HOST |
Server host | 0.0.0.0 |
No |
PORT |
Server port | 8000 |
No |
Copy .env.example to .env and customize as needed.
POST /v1/chat/completions
Drop-in replacement for OpenAI's chat completions API, with extended response containing video URLs.
{
"model": "gesturegpt-v1",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
"format": "mp4"
}{
"id": "chatcmpl-1234567890",
"object": "chat.completion",
"created": 1704067200,
"model": "gesturegpt-v1",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "HELLO! I HAPPY MEET YOU. HOW YOU TODAY?"
},
"finish_reason": "stop",
"video_urls": [
"https://www.signasl.org/sign/hello",
"https://www.signasl.org/sign/i",
"https://www.signasl.org/sign/happy",
"https://www.signasl.org/sign/meet",
"https://www.signasl.org/sign/you",
"https://www.signasl.org/sign/how",
"https://www.signasl.org/sign/you",
"https://www.signasl.org/sign/today"
],
"missing_videos": [],
"user_input_asl": "HELLO HOW YOU"
}],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 18,
"total_tokens": 43
}
}| Field | Type | Description |
|---|---|---|
message.content |
string | ASL-friendly text response |
video_urls |
array | List of video URLs for each sign |
missing_videos |
array | Words without available videos |
user_input_asl |
string | User's message normalized to ASL |
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "gesturegpt-v1",
"messages": [{"role": "user", "content": "What is your name?"}],
"format": "mp4"
}'from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="gesturegpt-v1",
messages=[{"role": "user", "content": "Hello!"}],
extra_body={"format": "mp4"}
)
print("Text:", response.choices[0].message.content)
print("Videos:", response.choices[0].video_urls)
print("Missing:", response.choices[0].missing_videos)POST /api/sign-language/generate
Direct text-to-ASL conversion without LLM processing.
{
"text": "Hello, how are you?",
"format": "mp4"
}{
"success": true,
"video_urls": [
"https://www.signasl.org/sign/hello",
"https://www.signasl.org/sign/how",
"https://www.signasl.org/sign/are",
"https://www.signasl.org/sign/you"
],
"missing_videos": [],
"text": "Hello, how are you?",
"normalized_text": "HELLO HOW ARE YOU",
"format": "mp4",
"timestamp": "2025-01-15T10:30:00Z"
}{
"detail": "Missing required field: text"
}import requests
# Chat endpoint
response = requests.post(
"http://localhost:8000/v1/chat/completions",
json={
"model": "gesturegpt-v1",
"messages": [{"role": "user", "content": "What is sign language?"}],
"format": "mp4"
}
)
data = response.json()
video_urls = data["choices"][0]["video_urls"]
text_response = data["choices"][0]["message"]["content"]
missing = data["choices"][0].get("missing_videos", [])
print(f"Text: {text_response}")
print(f"Videos: {len(video_urls)} found")
if missing:
print(f"Missing: {missing}")from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed"
)
# Multi-turn conversation
response = client.chat.completions.create(
model="gesturegpt-v1",
messages=[
{"role": "user", "content": "Hi, my name is Alex"},
{"role": "assistant", "content": "HELLO! NICE MEET YOU ALEX."},
{"role": "user", "content": "Can you teach me some signs?"}
],
extra_body={"format": "mp4"}
)
print(response.choices[0].message.content)
print(response.choices[0].video_urls)const response = await fetch('http://localhost:8000/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'gesturegpt-v1',
messages: [{ role: 'user', content: 'Hello!' }],
format: 'mp4'
})
});
const data = await response.json();
console.log('Text:', data.choices[0].message.content);
console.log('Videos:', data.choices[0].video_urls);
console.log('Missing:', data.choices[0].missing_videos);import requests
response = requests.post(
"http://localhost:8000/api/sign-language/generate",
json={
"text": "Good morning! How are you today?",
"format": "mp4"
}
)
data = response.json()
print(f"Original: {data['text']}")
print(f"ASL: {data['normalized_text']}")
print(f"Videos: {data['video_urls']}")- Swagger UI: http://localhost:8000/docs (Interactive API explorer)
- ReDoc: http://localhost:8000/redoc (Alternative documentation)
- Health Check: http://localhost:8000/health
- Models List: http://localhost:8000/v1/models
- Docker Quick Start - Get running in 2 minutes
- LLM Configuration - Configure OpenAI/Claude/Local LLMs
- Deployment Guide - Production deployment best practices
- Demo Usage - Using the Streamlit demo interface
When running with the demo docker-compose setup, access the interactive web interface at:
Features:
- Chat interface with conversation history
- Direct text-to-ASL conversion
- API documentation reference
- Video playback controls
- Multiple video format support
GestureGPT/
βββ app/
β βββ api/
β β βββ chat.py # OpenAI-compatible /v1/chat/completions
β β βββ sign_language.py # Direct /api/sign-language/generate
β βββ models/
β β βββ schemas.py # Pydantic request/response models
β βββ services/
β β βββ llm_service.py # Multi-provider LLM integration
β β βββ text_normalizer.py # ASL grammar normalization
β β βββ video_repository.py # Video lookup with caching
β β βββ sign_language_service.py # Core sign language logic
β β βββ signasl_client.py # SignASL.org API client
β βββ main.py # FastAPI application entry
β
βββ demo/
β βββ streamlit_app.py # Interactive Streamlit demo
β βββ Dockerfile # Demo container image
β βββ docker-compose.yml # Full stack orchestration
β βββ .env.example # Demo environment template
β βββ README.md # Demo documentation
β
βββ docs/
β βββ README.md # Documentation index
β βββ DOCKER_QUICKSTART.md # Quick start guide
β βββ LLM_CONFIGURATION.md # LLM setup instructions
β βββ DEPLOYMENT.md # Production deployment
β βββ architecture.png # Architecture diagram
β βββ architecture.puml # PlantUML source
β
βββ .github/
β βββ workflows/
β βββ docker-publish.yml # Auto-build and publish to GHCR
β
βββ data/
β βββ video_cache.json # Local video URL cache
β
βββ Dockerfile # Backend container image
βββ docker-compose.yml # Backend + SignASL API
βββ requirements.txt # Python dependencies
βββ .env.example # Backend environment template
βββ .gitignore # Git ignore rules
βββ LICENSE # MIT License
βββ README.md # This file
# Clone repository
git clone https://github.com/NotYuSheng/GestureGPT.git
cd GestureGPT
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env with your settings
# Run with hot reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
# Or run directly
python -m app.main# Use development compose file with hot reload
docker compose -f docker-compose.dev.yml up -d
# Code changes will automatically reload the server
# Logs will show reload events
docker compose -f docker-compose.dev.yml logs -f# Build backend image
docker build -t gesturegpt:dev .
# Build demo image
cd demo
docker build -t gesturegpt-demo:dev .
# Run locally built images
docker run -p 8000:8000 gesturegpt:dev
docker run -p 8501:8501 gesturegpt-demo:dev# Format code with black
black app/ demo/
# Lint with flake8
flake8 app/ demo/
# Type check with mypy
mypy app/# TODO: Add comprehensive test suite
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=app --cov-report=html- Configure production LLM provider (OpenAI/Claude)
- Set up proper API key management (environment variables/secrets)
- Configure SignASL API or self-host the scraper
- Set up video URL caching (persistent volume)
- Enable HTTPS/TLS (reverse proxy like Nginx/Caddy)
- Configure CORS for your frontend domain
- Set up monitoring and logging
- Implement rate limiting (API gateway or FastAPI middleware)
- Configure health checks for container orchestration
- Set resource limits (CPU/memory)
- Enable automatic restarts
- Set up backup for video cache
# Clone repository
git clone https://github.com/NotYuSheng/GestureGPT.git
cd GestureGPT
# Copy and configure production environment
cp .env.example .env
# Edit .env with production settings
# Pull latest images
docker compose pull
# Start in production mode
docker compose up -d
# Monitor logs
docker compose logs -f
# Check health
curl http://localhost:8000/health# Production LLM (OpenAI)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-prod-xxx
OPENAI_MODEL=gpt-4
# Or Claude
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-prod-xxx
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
# SignASL API
SIGNASL_API_URL=http://signasl-api:8000
# Server
HOST=0.0.0.0
PORT=8000server {
listen 80;
server_name gesturegpt.yourdomain.com;
location / {
proxy_pass http://localhost:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}See Deployment Guide for detailed production setup.
Problem: GestureGPT can't connect to SignASL API
Solution:
# Check if SignASL API is running
docker compose ps
# Check SignASL API logs
docker compose logs signasl-api
# Restart SignASL API
docker compose restart signasl-api
# Verify SignASL API health
curl http://localhost:8001/healthProblem: Using LLM_PROVIDER=placeholder instead of real LLM
Solution:
# Edit .env file
nano .env
# Set a real LLM provider
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-your-key
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-3.5-turbo
# Restart backend
docker compose restart gesturegpt-backendProblem: SignASL.org doesn't have videos for all words
Solution: This is expected behavior. The API returns:
video_urls: Videos that were foundmissing_videos: Words without available videos
{
"video_urls": ["http://...", "http://..."],
"missing_videos": ["cryptocurrency", "blockchain"]
}Consider implementing fallback strategies:
- Fingerspelling (separate letters)
- Synonym replacement
- Custom video repository
Problem: Port conflict or missing environment variables
Solution:
# Check if ports are already in use
lsof -i :8000
lsof -i :8001
# Check container logs
docker compose logs gesturegpt-backend
# Verify .env file exists
cat .env
# Rebuild containers
docker compose down
docker compose up --build -dProblem: Video cache resets after container restart
Solution: Ensure volume is mounted correctly in docker-compose.yml:
volumes:
- ./data:/app/data # Persistent cache directory- Check GitHub Issues
- Join Discussions
- Review Documentation
Contributions are welcome! This project is open to suggestions, improvements, and bug fixes.
-
Fork the repository
git clone https://github.com/YOUR_USERNAME/GestureGPT.git cd GestureGPT -
Create a feature branch
git checkout -b feature/amazing-feature
-
Make your changes
- Follow existing code style
- Add tests for new features
- Update documentation as needed
-
Commit your changes
git add . git commit -m "Add amazing feature"
-
Push to your fork
git push origin feature/amazing-feature
-
Open a Pull Request
- Describe your changes
- Reference any related issues
- Wait for review
- Follow PEP 8 style guide for Python code
- Use type hints for function signatures
- Add docstrings for public functions
- Write unit tests for new features
- Update README.md if adding new features
- Keep commits atomic and well-described
- Adding support for new sign languages
- Improving ASL grammar normalization
- Performance optimizations
- Test coverage improvements
- Documentation enhancements
- Bug fixes and issue resolution
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2025 GestureGPT Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
- FastAPI - Excellent modern web framework for building APIs
- SignASL.org - ASL video resource and community
- OpenAI - API design inspiration and SDK compatibility
- Anthropic - Claude LLM integration
- NLTK - Natural language processing toolkit
- Streamlit - Rapid demo interface development
- The Sign Language Community - Inspiration and guidance
- American Sign Language (ASL)
- SignASL.org - ASL video dictionary
- WLASL Dataset - Word-Level ASL dataset
- OpenAI API Documentation
- FastAPI Documentation
- SignASL API - ASL video scraper service
- OpenAI Python SDK
- vLLM - Fast LLM inference
For questions, issues, or feature requests:
Built with β€οΈ for the sign language community
GitHub β’ Docker Hub β’ API Docs
β Star this repo if you find it helpful!

