A production-ready Retrieval-Augmented Generation (RAG) system demonstrating cutting-edge techniques for semantic search, document retrieval, and AI-powered question answering.
- Enhanced UI: Proper markdown rendering with syntax highlighting and formatted citations
- Flexible Document Ingestion: Support for PDF, Markdown, JSON, CSV, and HTML files
- Improved Relevance: Hybrid search combining vector and keyword matching
- Real Document Support: Ingest your own documents or use high-quality samples
- Production Ready: Docker Compose setup, health checks, and monitoring tools
- Hybrid Search: Combines vector similarity and keyword matching (configurable alpha)
- Query Expansion: Automatically expands queries for comprehensive coverage
- Cross-Encoder Reranking: Cohere's reranking for 40-50% better precision
- Adaptive Chunking: Smart document splitting with configurable size and overlap
- Streaming Responses: Real-time streaming with live markdown rendering
- Multiple Data Sources: Ingest from local files, URLs, or use sample data
- Performance Metrics: Detailed timing for retrieval, reranking, and generation
- Configuration Comparison: Side-by-side testing of different RAG configurations
- Citation Tracking: Automatic source attribution with relevance scores
- Qdrant Dashboard: Visual exploration of vector embeddings at http://localhost:6333/dashboard
| Component | Technology | Purpose |
|---|---|---|
| LLM | OpenAI GPT-4 Turbo | Answer generation |
| Embeddings | OpenAI text-embedding-3 | Semantic search |
| Vector Database | Qdrant | Vector storage & retrieval |
| Reranking | Cohere Rerank v3 | Result refinement |
| Framework | LangChain | RAG pipeline orchestration |
| Backend | Node.js + TypeScript | Server implementation |
| Frontend | Vanilla JS + Marked.js | Interactive UI |
# Clone repository
git clone https://github.com/hew/advanced-rag-demo.git
cd advanced-rag-demo
# Install dependencies
npm install
# Start Qdrant with Docker
npm run qdrant:setup
# Or manually: docker-compose up -d qdrant
# Configure API keys
cp .env.example .env
# Edit .env with your OpenAI and Cohere API keys
# Ingest documents
npm run ingest:files # Your documents from ./documents/
# Or use samples: npm run ingest:sample
# Start the server
npm run dev
# Open browser
open http://localhost:3000# Quick test without setup
MOCK_MODE=true npm run dev- Text:
.txt,.md,.markdown - Documents:
.pdf(via pdf-parse) - Data:
.json,.csv - Web:
.html,.htm
# Add documents to the folder
cp your-files/* documents/
# Ingest them
npm run ingest:files
# Check status
npm run qdrant:check- Frontend Performance Optimization Guide
- Modern JavaScript Development Guide
- React & Next.js Performance Guide
- Node.js Best Practices
- Microservices Architecture
- Machine Learning in Production
# Required
OPENAI_API_KEY=sk-...
# Optional but recommended
COHERE_API_KEY=...
# Vector Database (choose one)
QDRANT_URL=http://localhost:6333 # Local Docker
# QDRANT_URL=https://xxx.qdrant.io # Cloud
# QDRANT_API_KEY=... # For cloud
# RAG Parameters
CHUNK_SIZE=512
CHUNK_OVERLAP=128
TOP_K=10
RERANK_TOP_K=3
HYBRID_SEARCH_ALPHA=0.5 # 0=keyword, 1=vector# Standard query
POST /api/query
{
"question": "What are Core Web Vitals?",
"useReranking": true,
"useHybridSearch": true
}
# Streaming query (Server-Sent Events)
POST /api/query/stream
# Compare configurations
POST /api/compare
{
"question": "How to optimize React performance?"
}# Start all services
docker-compose up -d
# View logs
npm run docker:logs
# Stop services
npm run docker:down- Use Qdrant Cloud for managed vector database
- Set
NODE_ENV=production - Configure proper API key management
- Implement rate limiting and authentication
- Use HTTPS for all endpoints
See SETUP.md for detailed deployment instructions.
# Run tests
npm test
# Check Qdrant health
npm run qdrant:check
# Reset vector database
npm run qdrant:reset
# View Qdrant dashboard
open http://localhost:6333/dashboard
# Bundle analysis
ANALYZE=true npm run build| Optimization | Impact | Implementation |
|---|---|---|
| Hybrid Search | +30-40% recall | Combines vector + keyword search |
| Query Expansion | +20-25% coverage | Automatic synonym expansion |
| Reranking | +40-50% precision@3 | Cohere cross-encoder |
| Adaptive Chunking | Better context | Content-aware splitting |
| Response Streaming | -60% perceived latency | SSE implementation |
Qdrant Connection Failed
# Check if running
docker ps | grep qdrant
# Restart
npm run docker:down && npm run docker:up
# Check health
curl http://localhost:6333/readyzLow Relevance Scores
- Ensure documents are properly ingested:
npm run qdrant:check - Try adjusting
HYBRID_SEARCH_ALPHA(0.7 for more semantic) - Check if reranking is enabled
Slow Response Times
- Consider using GPT-3.5-turbo for faster responses
- Reduce
TOP_Kfor fewer documents - Enable response caching
.
βββ src/
β βββ server.ts # Express server
β βββ lib/
β β βββ ragPipeline.ts # Core RAG logic
β β βββ vectorStore.ts # Qdrant integration
β β βββ chunking.ts # Document processing
β β βββ qdrant-init.ts # Database setup
β βββ ingestion/
β βββ ingest-files.ts # File ingestion
β βββ ingest.ts # Sample data
βββ documents/ # Your documents here
βββ public/
β βββ index.html # Web UI
βββ scripts/
β βββ setup-qdrant.sh # Setup wizard
β βββ check-qdrant.js # Health check
βββ docker-compose.yml # Container setup
Contributions are welcome! See CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT - See LICENSE for details.
- LangChain for RAG framework
- Qdrant for vector search
- Cohere for reranking
- OpenAI for LLM and embeddings
- Multi-tenant support
- Authentication & authorization
- Conversation memory
- Document update/delete APIs
- Evaluation metrics dashboard
- Fine-tuning support
- Multi-language support
β If you find this project useful, please star it on GitHub!
For detailed setup instructions, see SETUP.md