Skip to content

Conversation

@raphaelgurtner
Copy link

The current implementation for creating ts queries uses plainto_tsquery(). This produces the following TSV query:

select plainto_tsquery('pg_catalog.german', 'Freundliche Grüsse');
-- -> 'freundlich' & 'gruss'

Because of that, these queries will only return any results if ALL keywords match. This can lead to counterintuitive cases where queries might return perfect results but none if some random filler word is added.

This tiny PR changes the query creation to the following approach:

select websearch_to_tsquery('pg_catalog.german', regexp_replace('Freundliche Grüsse', '\s', ' OR ', 'g'));
-- -> 'freundlich' | 'gruss'

Which fixes the counterintuitive results. Unfortunately postgres does not support creating OR queries out of the box, which is why the query needs to be rewritten using regexp_replace. websearch_to_tsquery is used as it supports some operators (including OR for our case) and is recommended for raw user-supplied input https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES

PR could be extended to make the search behavior configurable via HybridSearchConfig

produces ts queries with logical OR instead of logical AND
else ""
)
query_tsv = f"plainto_tsquery({lang} :fts_query)"
query_tsv = fr"websearch_to_tsquery({lang} regexp_replace(:fts_query, '\s', ' OR ', 'g'))"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree we should be using the regex to enforce OR. This will not match the user's expectation of using FTS with the user's query.

Copy link
Author

@raphaelgurtner raphaelgurtner Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick answer! Interesting, I’ve been discussing expected behavior with a few of my colleagues and all seemed to expect the OR behavior.
The current behavior seems unexpected for queries like

  • “very specific terms that result in perfect chunk retrieval with FTS”

vs.

  • “very specific terms that result in perfect chunk retrieval with FTS, thank you”

As long as these queries/user prompts are passed to the retriever as-is it’s highly likely that the second query returns no result at all.

We found that behavior odd one loses the benefit of the FTS part of the hybrid search in most cases, at least in our testing.

what would you think about making the behavior configurable in HybridSearchConfig?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants