From 3ecd3acbec1b39f1c0db0c49c834fe761373016c Mon Sep 17 00:00:00 2001 From: DrMelone <27028174+Classic298@users.noreply.github.com> Date: Tue, 30 Dec 2025 22:04:47 +0100 Subject: [PATCH] md header splitting --- docs/getting-started/env-configuration.mdx | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/docs/getting-started/env-configuration.mdx b/docs/getting-started/env-configuration.mdx index c30585068..b2c6e6803 100644 --- a/docs/getting-started/env-configuration.mdx +++ b/docs/getting-started/env-configuration.mdx @@ -2528,11 +2528,25 @@ Provide a clear and direct response to the user's query, including inline citati - Options: - `character` - `token` - - `markdown_header` - Default: `character` -- Description: Sets the text splitter for RAG models. +- Description: Sets the text splitter for RAG models. Use `character` for RecursiveCharacterTextSplitter or `token` for TokenTextSplitter (Tiktoken-based). - Persistence: This environment variable is a `PersistentConfig` variable. +#### `ENABLE_MARKDOWN_HEADER_TEXT_SPLITTER` + +- Type: `bool` +- Default: `True` +- Description: Enables markdown header text splitting as a preprocessing step before character or token splitting. When enabled, documents are first split by markdown headers (h1-h6), then the resulting chunks are further processed by the configured text splitter (`RAG_TEXT_SPLITTER`). This helps preserve document structure and context across chunks. +- Persistence: This environment variable is a `PersistentConfig` variable. + +:::info + +**Migration from `markdown_header` TEXT_SPLITTER** + +The `markdown_header` option has been removed from `RAG_TEXT_SPLITTER`. Markdown header splitting is now a preprocessing step controlled by `ENABLE_MARKDOWN_HEADER_TEXT_SPLITTER`. If you were using `RAG_TEXT_SPLITTER=markdown_header`, switch to `character` or `token` and ensure `ENABLE_MARKDOWN_HEADER_TEXT_SPLITTER` is enabled (it is enabled by default). + +::: + #### `TIKTOKEN_CACHE_DIR` - Type: `str`