Text2SQL Agent

Self-correcting AI agent for natural language to SQL using HuggingFace smolagents and the ReAct framework.

Transform natural language questions into accurate SQL queries with multi-step reasoning, self-correction, and transparent decision-making.

Why Agent-Based?

Traditional Text2SQL pipelines generate SQL in one shot with no validation. If the query is wrong, you get incorrect results silently.

Our agent approach uses the ReAct framework to:

Reason about the query before generating SQL
Validate results to ensure correctness
Self-correct when queries are wrong
Achieve 50-70% higher accuracy on complex queries

User: "Which waiter got the most tips?"

Agent Step 1: Inspect schema, find tips table
Agent Step 2: Generate SQL with GROUP BY and SUM
Agent Step 3: Validate results make sense
Agent Step 4: Return correct answer

Quick Start

Prerequisites

Python 3.10+
HuggingFace account and API token

Installation

git clone https://github.com/Sakeeb91/text2sql-agent.git
cd text2sql-agent
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configuration

cp .env.example .env
# Edit .env with your settings

Minimal .env:

HUGGINGFACE_TOKEN=hf_xxxxx
DATABASE_URL=sqlite:///./data/app.db
TEXT2SQL_MODEL=Qwen/Qwen2.5-Coder-7B-Instruct
AGENT_MODEL_BACKEND=hf_inference
AGENT_INFERENCE_PROVIDER=

Run

uvicorn app.main:app --reload

Visit http://localhost:8000/docs for interactive API documentation.

Usage

Python

import requests

response = requests.post(
    "http://localhost:8000/api/v1/query",
    json={
        "query": "Show me all active users with their order totals",
        "database_id": "default",
        "execute": True,
        "show_reasoning": True
    }
)

result = response.json()
print(f"SQL: {result['sql']}")
print(f"Confidence: {result['confidence']}")

cURL

curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Show me top 10 customers by revenue",
    "database_id": "default",
    "execute": true
  }'

Response

{
    "sql": "SELECT name, SUM(total) as revenue FROM customers JOIN orders ON customers.id = orders.customer_id GROUP BY customers.id ORDER BY revenue DESC LIMIT 10",
    "confidence": 0.92,
    "results": [...],
    "reasoning_trace": [
        {"step": 1, "thought": "Need to join customers and orders"},
        {"step": 2, "thought": "Aggregate by customer, sort by total"}
    ]
}

Multi-Database Setup

Enable multi-db routing and register additional databases per tenant.

Set environment flags:

MULTIDB_ENABLED=true
MULTIDB_REQUIRE_CONNECTION_TEST=true

Register a database (schema registration primes cache):

curl -X POST http://localhost:8000/api/v1/schema/register \
  -H "Content-Type: application/json" \
  -d '{
    "database_id": "analytics",
    "connection_string": "postgresql://user:pass@host:5432/analytics",
    "dialect": "postgresql"
  }'

Use the database_id in queries:

curl -X POST http://localhost:8000/api/v1/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Show daily signups",
    "database_id": "analytics",
    "execute": false
  }'

Single-tenant fallback: set MULTIDB_ENABLED=false to route all requests to the configured DATABASE_URL.

Architecture

+----------------------------------------------------------+
|                    FastAPI REST API                       |
+----------------------------+-----------------------------+
                             |
             +---------------v----------------+
             |      Agent Orchestrator        |
             |   (CodeAgent + ReAct Loop)     |
             +---------------+----------------+
                             |
             +---------------+----------------+
             |                                |
     +-------v--------+              +-------v--------+
     |  SQL Engine    |              |   Validator    |
     | - Execute SQL  |              | - Check results|
     | - Get schema   |              | - Suggest fix  |
     +-------+--------+              +-------+--------+
             |                                |
             +---------------+----------------+
                             |
             +---------------v----------------+
             |      Text2SQL LLM              |
             | (Qwen, Llama, Mistral, etc.)   |
             +--------------------------------+

Key Features

Agent Intelligence

Multi-step reasoning with transparent thought process
Self-correction when queries are incorrect
Schema-aware SQL generation
Output validation before returning
Semantic validation feedback for joins, aggregations, and rankings

Production Ready

SQL injection prevention
Rate limiting and authentication
Prometheus metrics, OpenTelemetry tracing, and structured logging
Multi-database support (PostgreSQL, MySQL, SQLite)

Performance

Query caching for instant repeated responses
Schema, prompt, and inference caching with TTL controls (Redis or in-memory)
Model quantization for memory efficiency
Async processing for high throughput
Streaming endpoint for long-running queries with batched results

API Endpoints

Endpoint	Description
`POST /api/v1/query`	Generate SQL from natural language
`POST /api/v1/query/batch`	Batch SQL generation
`GET /api/v1/schema/{db_id}`	Get database schema
`POST /api/v1/databases`	Register a database
`GET /api/v1/health`	Health check

See docs/API.md for complete API reference.

Authentication & Authorization

All /api/v1 endpoints (except /api/v1/health) require authentication.
Use Authorization: Bearer <jwt> or X-API-Key: <key> headers.
Mutation/management endpoints require scopes in MUTATION_SCOPES (default: write,admin).
Configure auth, scopes, and rate limiting in docs/CONFIGURATION.md.

Docker

# Build
docker build -t text2sql-agent .

# Run
docker run -p 8000:8000 --env-file .env text2sql-agent

# Or use Docker Compose
docker-compose up -d

Documentation

Document	Description
User Guide	How to ask questions and get accurate results
Workflow Guide	Developer integration patterns and workflows
API Reference	Complete endpoint documentation
Configuration	Environment variables and options
Deployment	Docker, Kubernetes, production
Development	Testing, contributing, code quality
Troubleshooting	Common issues and solutions
Architecture	System design and patterns
Implementation Plan	Detailed technical design

Benchmarks

Metric	Traditional Pipeline	Agent Approach
Simple Queries	85% accuracy	92% accuracy
Complex Queries	60% accuracy	90% accuracy
Silent Failures	15%	<1%

Contributing

Fork the repository
Create a feature branch
Run tests: pytest
Run linting: black . && ruff check .
Submit a Pull Request

Name		Name	Last commit message	Last commit date
Latest commit History 475 Commits
.claude		.claude
.github		.github
app		app
db		db
deploy		deploy
docs		docs
models		models
monitoring		monitoring
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
alembic.ini		alembic.ini
constraints.txt		constraints.txt
docker-compose.db-ha.yml		docker-compose.db-ha.yml
docker-compose.lb.yml		docker-compose.lb.yml
docker-compose.monitoring.yml		docker-compose.monitoring.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.staging.yml		docker-compose.staging.yml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-base.txt		requirements-base.txt
requirements-dev.txt		requirements-dev.txt
requirements-ml.txt		requirements-ml.txt
requirements.txt		requirements.txt
test_agent_e2e.py		test_agent_e2e.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text2SQL Agent

Why Agent-Based?

Quick Start

Prerequisites

Installation

Configuration

Run

Usage

Python

cURL

Response

Multi-Database Setup

Architecture

Key Features

Agent Intelligence

Production Ready

Performance

API Endpoints

Authentication & Authorization

Docker

Documentation

Benchmarks

Contributing

License

Support

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Sakeeb91/text2sql-agent

Folders and files

Latest commit

History

Repository files navigation

Text2SQL Agent

Why Agent-Based?

Quick Start

Prerequisites

Installation

Configuration

Run

Usage

Python

cURL

Response

Multi-Database Setup

Architecture

Key Features

Agent Intelligence

Production Ready

Performance

API Endpoints

Authentication & Authorization

Docker

Documentation

Benchmarks

Contributing

License

Support

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages