Skip to content

Rishavsingh03/NLP_basedSearchPlatform_v2

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

31 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Search Relevance Optimizer

Advanced NLP-powered search platform leveraging WinkNLP, Natural.js, Compromise.js, and machine learning algorithms including cosine similarity vectors, Levenshtein distance calculations, and intelligent semantic analysis. Features multi-modal search architecture with Elasticsearch and MongoDB integration, fuzzy matching with Fuse.js, spelling correction, and sophisticated re-ranking systems for context-aware product discovery and superior search relevance optimization.

πŸš€ Technical Architecture

Core Technologies

  • Backend: Express.js with multi-library NLP processing
  • Frontend: React with Vite and Tailwind CSS
  • Databases: Elasticsearch + MongoDB hybrid architecture
  • NLP Libraries: WinkNLP, Natural.js, Compromise.js for advanced text processing
  • Search Algorithms: Cosine similarity vectors, fuzzy matching, intelligent re-ranking

Advanced Features

  • Multi-Modal Search: Combines Elasticsearch, MongoDB, and fuzzy search
  • Semantic Analysis: Intent detection, category inference, brand recognition
  • Machine Learning: Vector similarity, cosine distance calculations
  • Smart Filtering: Gender detection, color analysis, price extraction
  • Context-Aware Ranking: Dynamic scoring based on query intent

πŸ› οΈ Key Components

NLP Processing Pipeline

  • Multi-Library Processing: WinkNLP for advanced linguistics, Natural.js for stemming/tokenization, Compromise.js for parsing
  • Tokenization & Stemming: Porter Stemmer via Natural.js with advanced text preprocessing
  • Phonetic Matching: Sound-based similarity detection with multiple algorithms
  • Intent Recognition: Query purpose understanding through semantic analysis
  • Entity Extraction: Brand, category, and feature identification across multiple NLP engines
  • String Similarity: Multiple algorithms including Levenshtein distance and cosine similarity

Search Intelligence

  • Vector Search Service: Cosine similarity-based product matching with custom vector generation
  • Semantic Service: Query expansion and intent analysis using multiple NLP libraries
  • Scoring Service: Multi-factor relevance calculation with ML-Matrix operations
  • Fallback Service: Progressive search degradation with spelling correction
  • NLP Service: Multi-engine processing combining WinkNLP, Natural.js, and Compromise.js
  • Improved NLP Service: Enhanced linguistic analysis with advanced tokenization and stemming

Database Integration

  • Elasticsearch: Full-text search with fuzzy matching
  • MongoDB: Structured data storage and regex queries
  • Vector Storage: Custom similarity indexing

πŸ“¦ Installation & Setup

Prerequisites

  • Node.js 16+
  • MongoDB
  • Elasticsearch
  • npm or yarn

Backend Setup

cd express_backend
npm install
npm run fresh-setup    # Initialize databases and indexes
npm run dev            # Start development server on port 8001

Frontend Setup

cd react-frontend
npm install
npm run dev           # Start development server on port 5173

Environment Configuration

Create .env file in backend directory:

MONGODB_URI=***************
ELASTICSEARCH_URI=http://localhost:9200
JWT_SECRET=your_jwt_secret
EMAIL_SERVICE_CONFIG=your_email_config

πŸ” Search Capabilities

Natural Language Understanding

  • Query Parsing: "red nike shoes for men under $100"
  • Intent Detection: Shopping, comparison, feature-based searches
  • Smart Inference: Context-aware category mapping
  • Multi-language Support: Extensible language processing

Advanced Matching

  • Exact Match: Direct product attribute matching
  • Fuzzy Search: Typo-tolerant search with Fuse.js
  • Semantic Similarity: Meaning-based product discovery
  • Hybrid Scoring: Combined relevance algorithms

Dynamic Filtering

  • Smart Categories: Auto-detected product categorization
  • Brand Recognition: Advanced brand entity extraction
  • Color Analysis: Visual attribute understanding
  • Price Intelligence: Range detection and filtering

πŸ—οΈ Architecture Highlights

Microservices Design

  • Search Service: Orchestrates multi-modal search
  • NLP Service: Handles natural language processing
  • Vector Service: Manages similarity calculations
  • Semantic Service: Provides intent analysis

Performance Optimization

  • Caching Strategy: Multi-level result caching
  • Index Management: Optimized Elasticsearch indexes
  • Query Optimization: Intelligent search routing
  • Batch Processing: Bulk data operations

Scalability Features

  • Horizontal Scaling: Microservice architecture
  • Database Sharding: Distributed data storage
  • Load Balancing: Request distribution
  • Async Processing: Non-blocking operations

πŸ“Š Technical Specifications

Machine Learning Components

  • Cosine Similarity: Vector-based product matching with cosine-similarity library
  • Levenshtein Distance: Edit distance calculations using leven and levenshtein-distance libraries
  • String Similarity: Text comparison algorithms with string-similarity library
  • Natural Language Processing: Multi-engine approach with WinkNLP, Natural.js, and Compromise.js
  • Fuzzy Matching: Fuse.js for approximate string matching and typo tolerance
  • Spelling Correction: Integrated spelling-corrector for query enhancement
  • Matrix Operations: ML-Matrix library for advanced mathematical computations

Search Algorithms

  • Multi-Match Queries: Elasticsearch full-text search
  • Regex Patterns: MongoDB flexible matching
  • Fuzzy Logic: Approximate string matching
  • Relevance Scoring: Dynamic ranking algorithms

Data Processing

  • Text Preprocessing: Cleaning and normalization
  • Feature Extraction: Product attribute analysis
  • Semantic Indexing: Meaning-based organization
  • Real-time Analytics: Search performance monitoring

πŸ”§ Development Features

Comprehensive Scripts

  • npm run full-reindex: Rebuild all search indexes
  • npm run index-products: Product data indexing

Testing & Monitoring

  • Advanced error handling and logging
  • Performance metrics collection
  • Search result analytics
  • Real-time system monitoring

🎯 Use Cases

Enterprise Search

  • Large-scale product catalogs
  • Multi-tenant search systems
  • Advanced filtering requirements
  • Real-time search analytics

AI-Powered Discovery

  • Intelligent product recommendations
  • Context-aware search results
  • Personalized search experiences
  • Semantic product matching

πŸ“ˆ Performance Metrics

  • Sub-second Response: Optimized query processing
  • High Relevance: Advanced scoring algorithms
  • Fault Tolerance: Graceful degradation strategies
  • Scalable Architecture: Handles high-volume requests

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 98.1%
  • CSS 1.7%
  • HTML 0.2%