Skip to content
Draft
1 change: 1 addition & 0 deletions _partials/_since_0_1_0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<Tag variant="hollow">Since [pg_textsearch v0.1.0](https://github.com/timescale/pg_textsearch/releases/tag/v0.1.0)</Tag>
56 changes: 28 additions & 28 deletions use-timescale/extensions/pg-textsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ products: [cloud, self_hosted]
---

import EA1125 from "versionContent/_partials/_early_access_11_25.mdx";
import SINCE010 from "versionContent/_partials/_since_0_1_0.mdx";
import IntegrationPrereqs from "versionContent/_partials/_integration-prereqs.mdx";

# Optimize full text search with BM25
Expand All @@ -27,13 +28,12 @@ matches. `pg_textsearch` implements the following:
This page shows you how to install `pg_textsearch`, configure BM25 indexes, and optimize your search capabilities using
the following best practice:

* **Memory planning**: size your `index_memory_limit` based on corpus vocabulary and document count
* **Language configuration**: choose appropriate text search configurations for your data language
* **Hybrid search**: combine with pgvector or pgvectorscale for applications requiring both semantic and keyword search
* **Query optimization**: use score thresholds to filter low-relevance results
* **Index monitoring**: regularly check index usage and memory consumption

<EA1125 /> this preview release is designed for development and staging environments. It is not recommended for use with hypertables.
<EA1125 /> this preview release is designed for development and staging environments.

## Prerequisites

Expand Down Expand Up @@ -124,39 +124,36 @@ Use efficient query patterns to leverage BM25 ranking and optimize search perfor
1. **Perform ranked searches using the distance operator**

```sql
SELECT name, description,
description <@> to_bm25query('ergonomic work', 'products_search_idx') as score
SELECT name, description, description <@> 'ergonomic work' as score
FROM products
ORDER BY description <@> to_bm25query('ergonomic work', 'products_search_idx')
LIMIT 3;
ORDER BY score
LIMIT 3
```

1. **Filter results by score threshold**

```sql
SELECT name,
description <@> to_bm25query('wireless', 'products_search_idx') as score
SELECT name, description <@> 'wireless' as score
FROM products
WHERE description <@> to_bm25query('wireless', 'products_search_idx') < -2.0;
WHERE description <@> 'wireless' < -2.0;
```

1. **Combine with standard SQL operations**

```sql
SELECT category, name,
description <@> to_bm25query('ergonomic', 'products_search_idx') as score
SELECT category, name, description <@> 'ergonomic' as score
FROM products
WHERE price < 500
AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -1.0
ORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx')
AND description <@> 'ergonomic' < -1.0
ORDER BY description <@> 'ergonomic'
LIMIT 5;
```

1. **Verify index usage with EXPLAIN**

```sql
EXPLAIN SELECT * FROM products
ORDER BY description <@> to_bm25query('wireless keyboard', 'products_search_idx')
ORDER BY description <@> 'ergonomic'
LIMIT 5;
```

Expand Down Expand Up @@ -267,26 +264,34 @@ Customize `pg_textsearch` behavior for your specific use case and data character

<Procedure>

1. **Configure the memory limit**
1. **Configure memory and performance settings**

To manage memory usage, you control when the in-memory index spills to disk segments. When the memtable reaches the
threshold, it automatically flushes to a segment at transaction commit.

The size of the memtable depends primarily on the number of distinct terms in your corpus. A corpus with longer
documents or more varied vocabulary requires more memory per document.
```sql
-- Set memory limit per index (default 64MB)
SET pg_textsearch.index_memory_limit = '128MB';
-- Set memtable spill threshold (default 800000 posting entries, ~8MB segments)
SET pg_textsearch.memtable_spill_threshold = 1000000;

-- Set bulk load spill threshold (default 100000 terms per transaction)
SET pg_textsearch.bulk_load_threshold = 150000;

-- Set default query limit when no LIMIT clause is present (default 1000)
SET pg_textsearch.default_limit = 5000;
```
<SINCE010 />

1. **Configure language-specific text processing**

```sql
-- French language configuration
CREATE INDEX products_fr_idx ON products_fr
USING pg_textsearch(description)
USING bm25(description)
WITH (text_config='french');

-- Simple tokenization without stemming
CREATE INDEX products_simple_idx ON products
USING pg_textsearch(description)
USING bm25(description)
WITH (text_config='simple');
```

Expand All @@ -310,7 +315,7 @@ Customize `pg_textsearch` behavior for your specific use case and data character

- View detailed index information
```sql
SELECT bm25_debug_dump_index('products_search_idx');
SELECT bm25_dump_index('products_search_idx');
```

</Procedure>
Expand All @@ -320,12 +325,7 @@ caching and pagination to improve user experience with large result sets.

## Current limitations

This preview release focuses on core BM25 functionality. It has the following limitations:

* **Memory-only storage**: indexes are limited by `pg_textsearch.index_memory_limit` (default 64MB)
* **No phrase queries**: cannot search for exact multi-word phrases yet

These limitations will be addressed in upcoming releases with disk-based segments and expanded query capabilities.
This preview release focuses on core BM25 functionality. In this release, you cannot search for exact multi-word phrases.


[bm25-wiki]: https://en.wikipedia.org/wiki/Okapi_BM25
Expand Down