GraphRAG README/compose fixes post refactor (#1221)

rbrugaro · pre-commit-ci[bot] · ashahba · web-flow · commit b38d9f32ecf9 · 2025-01-27T15:14:03.000-08:00
* GraphRAG README/compose fixes post refactor - Move env settings from set_env.sh script to README body - Description improvements - Fix tgi settings in docker compose file Signed-off-by: rbrygaro <rita.brugarolas.brufau@intel.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove compose_neo4j_llamaindex.yaml from PR #1221 and fix filename in README Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> --------- Signed-off-by: rbrygaro <rita.brugarolas.brufau@intel.com> Signed-off-by: rbrugaro <rita.brugarolas.brufau@intel.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <12436063+ashahba@users.noreply.github.com>
diff --git a/comps/dataprep/src/README_neo4j_llamaindex.md b/comps/dataprep/src/README_neo4j_llamaindex.md
@@ -2,14 +2,14 @@
 
 This Dataprep microservice performs:
 
-- Graph extraction (entities, relationships and descripttions) using LLM
+- Graph extraction (entities, relationships and descriptions) using LLM
 - Performs hierarchical_leiden clustering to identify communities in the knowledge graph
 - Generates a community symmary for each community
 - Stores all of the above in Neo4j Graph DB
 
-This microservice follows the graphRAG approached defined by Microsoft paper ["From Local to Global: A Graph RAG Approach to Query-Focused Summarization"](https://www.microsoft.com/en-us/research/publication/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization/) with some differences such as: 1) only level zero cluster summaries are leveraged, 2) The input context to the final answer generation is trimmed to fit maximum context length.
+This microservice follows the graphRAG approached defined by Microsoft paper ["From Local to Global: A Graph RAG Approach to Query-Focused Summarization"](https://www.microsoft.com/en-us/research/publication/from-local-to-global-a-graph-rag-approach-to-query-focused-summarization/) with some differences such as: 1) no node degree prioritization is used in populating the LLM context window for community summaries, 2) no ranking of sub-communities is applied in generating higher level communities summaries.
 
-This dataprep microservice ingests the input files and uses LLM (TGI or OpenAI model when OPENAI_API_KEY is set) to extract entities, relationships and descriptions of those to build a graph-based text index.
+This dataprep microservice ingests the input files and uses LLM (TGI, VLLM or OpenAI model when OPENAI_API_KEY is set) to extract entities, relationships and descriptions of those to build a graph-based text index. Compose yaml file deploys TGI but works also with vLLM inference endpoint.
 
 ## Setup Environment Variables
 
@@ -23,10 +23,20 @@ export NEO4J_URI=${your_neo4j_url}
 export NEO4J_USERNAME=${your_neo4j_username}
 export NEO4J_PASSWORD=${your_neo4j_password}  # should match what was used in NEO4J_AUTH when running the neo4j-apoc
 export PYTHONPATH=${path_to_comps}
-export OPENAI_KEY=${your_openai_api_key}  # optional, when not provided will use smaller models TGI/TEI
+export OPENAI_KEY=${your_openai_api_key}  # optional, when not provided will use open models TGI/TEI
 export HUGGINGFACEHUB_API_TOKEN=${your_hf_token}
+
 # set additional environment settings
-source ./set_env.sh
+export EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5"
+export OPENAI_EMBEDDING_MODEL="text-embedding-3-small"
+export LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
+export OPENAI_LLM_MODEL="gpt-4o"
+export TEI_EMBEDDING_ENDPOINT="http://${host_ip}:6006"
+export TGI_LLM_ENDPOINT="http://${host_ip}:6005"
+export NEO4J_URL="bolt://${host_ip}:7687"
+export NEO4J_USERNAME=neo4j
+export DATAPREP_SERVICE_ENDPOINT="http://${host_ip}:6004/v1/dataprep"
+export LOGFLAG=True
 ```
 
 ## 🚀Start Microservice with Docker
@@ -62,7 +72,7 @@ Docker compose will start 4 microservices: dataprep-neo4j-llamaindex, neo4j-apoc
 
 ```bash
 cd comps/dataprep/deployment/docker_compose
-docker compose -f ompose_neo4j_llamaindex.yaml up -d
+docker compose -f compose.yaml up -d
 ```
 
 ## Invoke Microservice