Skip to content

Commit c0f4c98

Browse files
committed
Feat: Implement dual-mode search architecture (Phase 1)
Implements automatic mode detection and routing for CodeGraph search: **Local Mode** (FAISS + local/ollama embeddings): - Existing FAISS implementation extracted into faiss_search_impl() - Used when CODEGRAPH_EMBEDDING_PROVIDER=local/ollama/lmstudio **Cloud Mode** (SurrealDB HNSW + Jina embeddings): - New cloud_search_impl() with basic SurrealDB integration - Used when CODEGRAPH_EMBEDDING_PROVIDER=jina - MVP: Returns filtered results (HNSW search + reranking TODO Phase 2) **Mode Detection**: - SearchMode enum (Local/Cloud) - detect_search_mode() reads CODEGRAPH_EMBEDDING_PROVIDER - bin_search_with_scores_shared() routes to correct implementation **Documentation**: - Updated .env.example with comprehensive dual-mode configuration - Documents Local vs Cloud trade-offs - SurrealDB connection settings for cloud mode **Status**: Phase 1 complete, compiles successfully **Next**: Phase 2 - SurrealDB HNSW search + Jina reranking integration
1 parent b24fab2 commit c0f4c98

File tree

8 files changed

+993
-1512
lines changed

8 files changed

+993
-1512
lines changed

.env.example

Lines changed: 43 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ CODEGRAPH_EMBEDDING_PROVIDER=auto
1313

1414
# Embedding Provider Configuration
1515
# ----------------------------------
16-
# Provider options: "auto", "onnx", "ollama", "openai", or "lmstudio"
16+
# Provider options: "auto", "onnx", "ollama", "openai", "jina", or "lmstudio"
1717
# CODEGRAPH_EMBEDDING_PROVIDER=auto
1818

1919
# ONNX: Specify model path (or leave empty for auto-detection from HuggingFace cache)
@@ -33,12 +33,54 @@ CODEGRAPH_EMBEDDING_PROVIDER=auto
3333
# OpenAI: Model name (API key configured below in Security section)
3434
# CODEGRAPH_EMBEDDING_MODEL=text-embedding-3-small
3535

36+
# Jina AI: Cloud embeddings with reranking (requires JINA_API_KEY)
37+
# CODEGRAPH_EMBEDDING_PROVIDER=jina
38+
# JINA_API_KEY=your-jina-api-key-here
39+
40+
# ============================================================================
41+
# Dual-Mode Search Configuration
42+
# ============================================================================
43+
# CodeGraph supports two search modes based on CODEGRAPH_EMBEDDING_PROVIDER:
44+
#
45+
# Local Mode (FAISS + local/ollama embeddings)
46+
# ---------------------------------------------
47+
# - Uses FAISS for in-memory vector search
48+
# - Embeddings: ONNX, Ollama, or LM Studio
49+
# - Best for: Desktop development, privacy-focused setups
50+
# - Requires: Build with --features faiss
51+
# Example:
52+
# CODEGRAPH_EMBEDDING_PROVIDER=local # or ollama or lmstudio
53+
#
54+
# Cloud Mode (SurrealDB HNSW + Jina embeddings + reranking)
55+
# ----------------------------------------------------------
56+
# - Uses SurrealDB HNSW indexes for scalable vector search
57+
# - Embeddings: Jina AI (2048 dimensions)
58+
# - Reranking: Jina reranker-v3 for improved relevance
59+
# - Best for: Cloud deployments, multi-user systems, scalability
60+
# - Requires: SurrealDB instance, Jina API key
61+
# Example:
62+
# CODEGRAPH_EMBEDDING_PROVIDER=jina
63+
# JINA_API_KEY=your-jina-api-key-here
64+
#
65+
# SurrealDB Connection (required for cloud mode)
66+
# SURREALDB_URL=ws://localhost:3004
67+
# SURREALDB_NAMESPACE=codegraph
68+
# SURREALDB_DATABASE=main
69+
# SURREALDB_USERNAME=root
70+
# SURREALDB_PASSWORD=root
71+
#
72+
# Important: HNSW index dimension must match embedding provider
73+
# - Jina v4: 2048 dimensions
74+
# - Local ONNX: typically 384 or 768 dimensions
75+
# - Update schema/codegraph.surql if changing providers
76+
3677
# LLM Configuration (for local insights generation)
3778
# --------------------------------------------------
3879
# Leave empty to use context-only mode (fastest, recommended for agents like Claude/GPT-4)
3980
# Set to enable local LLM insights generation
4081

4182
# LM Studio with DeepSeek Coder v2 Lite Instruct (recommended)
83+
# Supported LLM provider options: "auto", "onnx", "lmstudio", "openai", "claude" or "ollama"
4284
# Superior MLX support and Flash Attention 2 on macOS
4385
# CODEGRAPH_LLM_PROVIDER=lmstudio
4486
# CODEGRAPH_MODEL=lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF/DeepSeek-Coder-V2-Lite-Instruct-Q4_K_M.gguf

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,4 +169,5 @@ AGENTS.md
169169
CRUSH.md
170170
OUROBOROS.md
171171
.codegraph/
172-
SESSION-MEMORY.md
172+
SESSION-MEMORY.md
173+
.serena/

crates/codegraph-mcp/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ embeddings = ["dep:codegraph-vector"]
7979
embeddings-local = ["embeddings", "codegraph-vector/local-embeddings"]
8080
embeddings-openai = ["embeddings", "codegraph-vector/openai"]
8181
embeddings-ollama = ["embeddings", "codegraph-vector/ollama"]
82+
embeddings-jina = ["embeddings", "codegraph-vector/jina"]
8283
server-http = ["dep:axum", "dep:hyper"]
8384
qwen-integration = []
8485
ai-enhanced = ["dep:codegraph-ai", "faiss", "embeddings"]

0 commit comments

Comments
 (0)