👋 GestureGPT

AI-Powered Sign Language Translation API

Transform text into American Sign Language (ASL) videos through an OpenAI-compatible API. GestureGPT combines LLM intelligence with ASL video generation to make sign language accessible through a simple REST interface.

Quick Start • Documentation • API Reference • Contributing

📋 Table of Contents

Overview
Features
Architecture
Technology Stack
Quick Start
Configuration
- LLM Providers
- Environment Variables
API Reference
- OpenAI-Compatible Chat Endpoint
- Direct Sign Language Endpoint
Usage Examples
Documentation
Project Structure
Development
Deployment
Troubleshooting
Contributing
License

🌟 Overview

GestureGPT is a FastAPI-based service that bridges the gap between text communication and American Sign Language. It provides an OpenAI-compatible API that responds with ASL video sequences instead of plain text, making sign language accessible through standard API calls.

How It Works

User Text Input
    ↓
LLM generates ASL-friendly text (OpenAI/Claude/Ollama/Local)
    ↓
Text Normalizer converts to ASL grammar tokens
    ↓
Video Repository looks up corresponding sign videos
    ↓
Returns: { video_urls: [...], text_transcript, missing_videos: [...] }

Key Capabilities

OpenAI-Compatible API: Drop-in replacement for OpenAI chat endpoints
Multiple LLM Backends: Support for OpenAI, Claude, Ollama, LM Studio, and more
Real-time Video Lookup: Retrieves pre-recorded ASL videos from SignASL.org
Smart Caching: Local video URL caching for improved performance
Interactive Demo: Streamlit-based web interface for testing
Docker-First: Pre-built images available on GitHub Container Registry

✨ Features

Core Features

🔄 OpenAI-Compatible API
- Works with OpenAI Python SDK
- Standard chat completions format
- Extended response with video URLs
🎥 ASL Video Generation
- Real ASL videos from SignASL.org
- Multiple video URLs per response
- Missing word detection
🤖 Flexible LLM Integration
- OpenAI GPT models
- Anthropic Claude
- Local models (Ollama, LM Studio, vLLM)
- Custom endpoints

Additional Features

📋 ASL Grammar Processing
- Automatic text normalization
- ASL-friendly sentence structure
- Word tokenization with NLTK
🚀 Production Ready
- FastAPI high-performance backend
- Async request handling
- Auto-generated API docs (Swagger/ReDoc)
🐳 Easy Deployment
- Pre-built Docker images
- Docker Compose orchestration
- Hot reload for development

🏗️ Architecture

System Components

┌─────────────────┐
│   User/Client   │
└────────┬────────┘
         │
         ▼
┌─────────────────────────────────────┐
│      GestureGPT API (FastAPI)       │
│  ┌───────────────────────────────┐  │
│  │  OpenAI-Compatible Endpoints  │  │
│  └───────────────────────────────┘  │
│  ┌───────────────────────────────┐  │
│  │   Direct Conversion Endpoint  │  │
│  └───────────────────────────────┘  │
└──────┬────────────────────┬─────────┘
       │                    │
       ▼                    ▼
┌──────────────┐    ┌──────────────────┐
│  LLM Service │    │ Video Repository │
│              │    │   (SignASL API)  │
│ - OpenAI     │    │                  │
│ - Claude     │    │  Local Cache     │
│ - Ollama     │    │  (video_cache)   │
│ - vLLM       │    │                  │
└──────────────┘    └──────────────────┘

Request Flow

User Input: Client sends text message via API
LLM Processing: LLM generates ASL-friendly response
Text Normalization: Convert to uppercase ASL tokens
Video Lookup: Query SignASL API for video URLs (with caching)
Response Assembly: Return video URLs + text transcript
Client Playback: Client displays videos in sequence

See architecture.puml for the detailed workflow diagram.

🛠️ Technology Stack

Component	Technology	Purpose
Backend Framework	FastAPI	High-performance async API server
LLM Integration	OpenAI SDK, Anthropic SDK	Multi-provider LLM support
Text Processing	NLTK	ASL grammar normalization
Video Source	SignASL.org API	Real ASL video repository
Caching	JSON file cache	Video URL caching
Demo Interface	Streamlit	Interactive web demo
Containerization	Docker, Docker Compose	Deployment and orchestration
CI/CD	GitHub Actions	Automated image builds
Documentation	Swagger UI, ReDoc	Auto-generated API docs

🚀 Quick Start

Using Docker Compose (Recommended)

Run both GestureGPT backend and SignASL API together:

# Clone the repository
git clone https://github.com/NotYuSheng/GestureGPT.git
cd GestureGPT

# Start all services (backend, SignASL API, demo)
docker compose up -d

# View logs
docker compose logs -f

# Services available at:
# - GestureGPT API: http://localhost:8000
# - API Docs: http://localhost:8000/docs
# - SignASL API: http://localhost:8001
# - Streamlit Demo: http://localhost:8501 (if using demo compose file)

Test the API:

curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gesturegpt-v1",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "format": "mp4"
  }'

Using Pre-built Images

GestureGPT Backend:

docker pull ghcr.io/notyusheng/gesturegpt:latest
docker run -d -p 8000:8000 \
  -e LLM_PROVIDER=openai \
  -e OPENAI_API_KEY=your-key-here \
  ghcr.io/notyusheng/gesturegpt:latest

# Access API docs at http://localhost:8000/docs

SignASL API:

docker pull ghcr.io/notyusheng/signasl-api:latest
docker run -d -p 8001:8000 \
  -v ./cache:/app/cache \
  ghcr.io/notyusheng/signasl-api:latest

See Docker Quick Start Guide for detailed instructions.

Local Development

# Clone repository
git clone https://github.com/NotYuSheng/GestureGPT.git
cd GestureGPT

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Copy and configure environment
cp .env.example .env
# Edit .env with your LLM provider settings

# Run the server
python -m app.main
# Or with hot reload:
# uvicorn app.main:app --reload

# Access API at http://localhost:8000

⚙️ Configuration

LLM Providers

GestureGPT supports multiple LLM backends. Configure in .env file:

OpenAI

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-3.5-turbo
OPENAI_BASE_URL=https://api.openai.com/v1

Anthropic Claude

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

Local LLM (Ollama)

LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=dummy
OPENAI_MODEL=llama3.2

vLLM Server

LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
OPENAI_MODEL=Qwen2.5-VL-7B-Instruct

LM Studio

LLM_PROVIDER=openai
OPENAI_BASE_URL=http://localhost:1234/v1
OPENAI_API_KEY=lm-studio
OPENAI_MODEL=local-model

See LLM Configuration Guide for detailed setup instructions.

Environment Variables

Variable	Description	Default	Required
`LLM_PROVIDER`	LLM backend (`openai`, `anthropic`, `placeholder`)	`placeholder`	No
`OPENAI_API_KEY`	OpenAI API key	-	If using OpenAI
`OPENAI_BASE_URL`	OpenAI-compatible endpoint	`https://api.openai.com/v1`	No
`OPENAI_MODEL`	Model name	`gpt-3.5-turbo`	No
`ANTHROPIC_API_KEY`	Anthropic API key	-	If using Claude
`ANTHROPIC_MODEL`	Claude model name	`claude-3-5-sonnet-20241022`	No
`SIGNASL_API_URL`	SignASL API endpoint	`http://localhost:8001`	No
`HOST`	Server host	`0.0.0.0`	No
`PORT`	Server port	`8000`	No

Copy .env.example to .env and customize as needed.

📡 API Reference

OpenAI-Compatible Chat Endpoint

POST /v1/chat/completions

Drop-in replacement for OpenAI's chat completions API, with extended response containing video URLs.

Request

{
  "model": "gesturegpt-v1",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello, how are you?"}
  ],
  "format": "mp4"
}

Response (200 OK)

{
  "id": "chatcmpl-1234567890",
  "object": "chat.completion",
  "created": 1704067200,
  "model": "gesturegpt-v1",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "HELLO! I HAPPY MEET YOU. HOW YOU TODAY?"
    },
    "finish_reason": "stop",
    "video_urls": [
      "https://www.signasl.org/sign/hello",
      "https://www.signasl.org/sign/i",
      "https://www.signasl.org/sign/happy",
      "https://www.signasl.org/sign/meet",
      "https://www.signasl.org/sign/you",
      "https://www.signasl.org/sign/how",
      "https://www.signasl.org/sign/you",
      "https://www.signasl.org/sign/today"
    ],
    "missing_videos": [],
    "user_input_asl": "HELLO HOW YOU"
  }],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 18,
    "total_tokens": 43
  }
}

Response Fields

Field	Type	Description
`message.content`	string	ASL-friendly text response
`video_urls`	array	List of video URLs for each sign
`missing_videos`	array	Words without available videos
`user_input_asl`	string	User's message normalized to ASL

Example using curl

curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gesturegpt-v1",
    "messages": [{"role": "user", "content": "What is your name?"}],
    "format": "mp4"
  }'

Example using OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="gesturegpt-v1",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_body={"format": "mp4"}
)

print("Text:", response.choices[0].message.content)
print("Videos:", response.choices[0].video_urls)
print("Missing:", response.choices[0].missing_videos)

Direct Sign Language Endpoint

POST /api/sign-language/generate

Direct text-to-ASL conversion without LLM processing.

Request

{
  "text": "Hello, how are you?",
  "format": "mp4"
}

Response (200 OK)

{
  "success": true,
  "video_urls": [
    "https://www.signasl.org/sign/hello",
    "https://www.signasl.org/sign/how",
    "https://www.signasl.org/sign/are",
    "https://www.signasl.org/sign/you"
  ],
  "missing_videos": [],
  "text": "Hello, how are you?",
  "normalized_text": "HELLO HOW ARE YOU",
  "format": "mp4",
  "timestamp": "2025-01-15T10:30:00Z"
}

Error Response (400 Bad Request)

{
  "detail": "Missing required field: text"
}

💡 Usage Examples

Python - Using Requests

import requests

# Chat endpoint
response = requests.post(
    "http://localhost:8000/v1/chat/completions",
    json={
        "model": "gesturegpt-v1",
        "messages": [{"role": "user", "content": "What is sign language?"}],
        "format": "mp4"
    }
)

data = response.json()
video_urls = data["choices"][0]["video_urls"]
text_response = data["choices"][0]["message"]["content"]
missing = data["choices"][0].get("missing_videos", [])

print(f"Text: {text_response}")
print(f"Videos: {len(video_urls)} found")
if missing:
    print(f"Missing: {missing}")

Python - Using OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

# Multi-turn conversation
response = client.chat.completions.create(
    model="gesturegpt-v1",
    messages=[
        {"role": "user", "content": "Hi, my name is Alex"},
        {"role": "assistant", "content": "HELLO! NICE MEET YOU ALEX."},
        {"role": "user", "content": "Can you teach me some signs?"}
    ],
    extra_body={"format": "mp4"}
)

print(response.choices[0].message.content)
print(response.choices[0].video_urls)

JavaScript/Node.js

const response = await fetch('http://localhost:8000/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'gesturegpt-v1',
    messages: [{ role: 'user', content: 'Hello!' }],
    format: 'mp4'
  })
});

const data = await response.json();
console.log('Text:', data.choices[0].message.content);
console.log('Videos:', data.choices[0].video_urls);
console.log('Missing:', data.choices[0].missing_videos);

Direct Conversion Example

import requests

response = requests.post(
    "http://localhost:8000/api/sign-language/generate",
    json={
        "text": "Good morning! How are you today?",
        "format": "mp4"
    }
)

data = response.json()
print(f"Original: {data['text']}")
print(f"ASL: {data['normalized_text']}")
print(f"Videos: {data['video_urls']}")

📖 Documentation

API Documentation

Swagger UI: http://localhost:8000/docs (Interactive API explorer)
ReDoc: http://localhost:8000/redoc (Alternative documentation)
Health Check: http://localhost:8000/health
Models List: http://localhost:8000/v1/models

Guides

Docker Quick Start - Get running in 2 minutes
LLM Configuration - Configure OpenAI/Claude/Local LLMs
Deployment Guide - Production deployment best practices
Demo Usage - Using the Streamlit demo interface

Streamlit Demo

When running with the demo docker-compose setup, access the interactive web interface at:

http://localhost:8501

Features:

Chat interface with conversation history
Direct text-to-ASL conversion
API documentation reference
Video playback controls
Multiple video format support

📁 Project Structure

GestureGPT/
├── app/
│   ├── api/
│   │   ├── chat.py                    # OpenAI-compatible /v1/chat/completions
│   │   └── sign_language.py           # Direct /api/sign-language/generate
│   ├── models/
│   │   └── schemas.py                 # Pydantic request/response models
│   ├── services/
│   │   ├── llm_service.py             # Multi-provider LLM integration
│   │   ├── text_normalizer.py         # ASL grammar normalization
│   │   ├── video_repository.py        # Video lookup with caching
│   │   ├── sign_language_service.py   # Core sign language logic
│   │   └── signasl_client.py          # SignASL.org API client
│   └── main.py                        # FastAPI application entry
│
├── demo/
│   ├── streamlit_app.py               # Interactive Streamlit demo
│   ├── Dockerfile                     # Demo container image
│   ├── docker-compose.yml             # Full stack orchestration
│   ├── .env.example                   # Demo environment template
│   └── README.md                      # Demo documentation
│
├── docs/
│   ├── README.md                      # Documentation index
│   ├── DOCKER_QUICKSTART.md           # Quick start guide
│   ├── LLM_CONFIGURATION.md           # LLM setup instructions
│   ├── DEPLOYMENT.md                  # Production deployment
│   ├── architecture.png               # Architecture diagram
│   └── architecture.puml              # PlantUML source
│
├── .github/
│   └── workflows/
│       └── docker-publish.yml         # Auto-build and publish to GHCR
│
├── data/
│   └── video_cache.json               # Local video URL cache
│
├── Dockerfile                         # Backend container image
├── docker-compose.yml                 # Backend + SignASL API
├── requirements.txt                   # Python dependencies
├── .env.example                       # Backend environment template
├── .gitignore                         # Git ignore rules
├── LICENSE                            # MIT License
└── README.md                          # This file

🔨 Development

Local Development Setup

# Clone repository
git clone https://github.com/NotYuSheng/GestureGPT.git
cd GestureGPT

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your settings

# Run with hot reload
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Or run directly
python -m app.main

Development with Docker Compose

# Use development compose file with hot reload
docker compose -f docker-compose.dev.yml up -d

# Code changes will automatically reload the server
# Logs will show reload events
docker compose -f docker-compose.dev.yml logs -f

Building Docker Images

# Build backend image
docker build -t gesturegpt:dev .

# Build demo image
cd demo
docker build -t gesturegpt-demo:dev .

# Run locally built images
docker run -p 8000:8000 gesturegpt:dev
docker run -p 8501:8501 gesturegpt-demo:dev

Code Quality

# Format code with black
black app/ demo/

# Lint with flake8
flake8 app/ demo/

# Type check with mypy
mypy app/

Running Tests

# TODO: Add comprehensive test suite
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=app --cov-report=html

🚢 Deployment

Production Deployment Checklist

Docker Compose Production

# Clone repository
git clone https://github.com/NotYuSheng/GestureGPT.git
cd GestureGPT

# Copy and configure production environment
cp .env.example .env
# Edit .env with production settings

# Pull latest images
docker compose pull

# Start in production mode
docker compose up -d

# Monitor logs
docker compose logs -f

# Check health
curl http://localhost:8000/health

Environment Variables for Production

# Production LLM (OpenAI)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-prod-xxx
OPENAI_MODEL=gpt-4

# Or Claude
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-prod-xxx
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

# SignASL API
SIGNASL_API_URL=http://signasl-api:8000

# Server
HOST=0.0.0.0
PORT=8000

Reverse Proxy (Nginx)

server {
    listen 80;
    server_name gesturegpt.yourdomain.com;

    location / {
        proxy_pass http://localhost:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

See Deployment Guide for detailed production setup.

🐛 Troubleshooting

Common Issues

API returns "API Offline" error

Problem: GestureGPT can't connect to SignASL API

Solution:

# Check if SignASL API is running
docker compose ps

# Check SignASL API logs
docker compose logs signasl-api

# Restart SignASL API
docker compose restart signasl-api

# Verify SignASL API health
curl http://localhost:8001/health

LLM returns placeholder responses

Problem: Using LLM_PROVIDER=placeholder instead of real LLM

Solution:

# Edit .env file
nano .env

# Set a real LLM provider
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-your-key
OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_MODEL=gpt-3.5-turbo

# Restart backend
docker compose restart gesturegpt-backend

Missing videos for certain words

Problem: SignASL.org doesn't have videos for all words

Solution: This is expected behavior. The API returns:

video_urls: Videos that were found
missing_videos: Words without available videos

{
  "video_urls": ["http://...", "http://..."],
  "missing_videos": ["cryptocurrency", "blockchain"]
}

Consider implementing fallback strategies:

Fingerspelling (separate letters)
Synonym replacement
Custom video repository

Docker container fails to start

Problem: Port conflict or missing environment variables

Solution:

# Check if ports are already in use
lsof -i :8000
lsof -i :8001

# Check container logs
docker compose logs gesturegpt-backend

# Verify .env file exists
cat .env

# Rebuild containers
docker compose down
docker compose up --build -d

Cache not persisting between restarts

Problem: Video cache resets after container restart

Solution: Ensure volume is mounted correctly in docker-compose.yml:

volumes:
  - ./data:/app/data  # Persistent cache directory

Getting Help

🤝 Contributing

Contributions are welcome! This project is open to suggestions, improvements, and bug fixes.

How to Contribute

Fork the repository

git clone https://github.com/YOUR_USERNAME/GestureGPT.git
cd GestureGPT

Create a feature branch
```
git checkout -b feature/amazing-feature
```
Make your changes
- Follow existing code style
- Add tests for new features
- Update documentation as needed

Commit your changes

git add .
git commit -m "Add amazing feature"

Push to your fork
```
git push origin feature/amazing-feature
```
Open a Pull Request
- Describe your changes
- Reference any related issues
- Wait for review

Development Guidelines

Follow PEP 8 style guide for Python code
Use type hints for function signatures
Add docstrings for public functions
Write unit tests for new features
Update README.md if adding new features
Keep commits atomic and well-described

Areas for Contribution

Adding support for new sign languages
Improving ASL grammar normalization
Performance optimizations
Test coverage improvements
Documentation enhancements
Bug fixes and issue resolution

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2025 GestureGPT Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

🙏 Acknowledgments

FastAPI - Excellent modern web framework for building APIs
SignASL.org - ASL video resource and community
OpenAI - API design inspiration and SDK compatibility
Anthropic - Claude LLM integration
NLTK - Natural language processing toolkit
Streamlit - Rapid demo interface development
The Sign Language Community - Inspiration and guidance

📚 Additional Resources

External Resources

Related Projects

SignASL API - ASL video scraper service
OpenAI Python SDK
vLLM - Fast LLM inference

💬 Support

For questions, issues, or feature requests:

Built with ❤️ for the sign language community

GitHub • Docker Hub • API Docs

⭐ Star this repo if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
app		app
data		data
demo		demo
docs		docs
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

NotYuSheng/GestureGPT

Folders and files

Latest commit

History

Repository files navigation

👋 GestureGPT

AI-Powered Sign Language Translation API

📋 Table of Contents

🌟 Overview

How It Works

Key Capabilities

✨ Features

Core Features

Additional Features

🏗️ Architecture

System Components

Request Flow

🛠️ Technology Stack

🚀 Quick Start

Using Docker Compose (Recommended)

Using Pre-built Images

Local Development

⚙️ Configuration

LLM Providers

OpenAI

Anthropic Claude

Local LLM (Ollama)

vLLM Server

LM Studio

Environment Variables

📡 API Reference

OpenAI-Compatible Chat Endpoint

Request

Response (200 OK)

Response Fields

Example using curl

Example using OpenAI Python SDK

Direct Sign Language Endpoint

Request

Response (200 OK)

Error Response (400 Bad Request)

💡 Usage Examples

Python - Using Requests

Python - Using OpenAI SDK

JavaScript/Node.js

Direct Conversion Example

📖 Documentation

API Documentation

Guides

Streamlit Demo

📁 Project Structure

🔨 Development

Local Development Setup

Development with Docker Compose

Building Docker Images

Code Quality

Running Tests

🚢 Deployment

Production Deployment Checklist

Docker Compose Production

Environment Variables for Production

Reverse Proxy (Nginx)

🐛 Troubleshooting

Common Issues

API returns "API Offline" error

LLM returns placeholder responses

Missing videos for certain words

Docker container fails to start

Cache not persisting between restarts

Getting Help

🤝 Contributing

How to Contribute

Development Guidelines

Areas for Contribution

📄 License

🙏 Acknowledgments

📚 Additional Resources

External Resources

Related Projects

Packages