Skip to content

Fuzzy searching #1270

@reillysiemens

Description

@reillysiemens

I recently searched crates.io for "git" using the default sorting of relevance. I expected to find the git2 crate, but instead found the git crate.

Below is a screenshot of the exact match currently found when searching with https://crates.io/search?q=git.

crates-io-git-search-relevance

Searching for "git2" directly with https://crates.io/search?q=git2 produces the desired result with an exact match.
crates-io-git2-search-relevance

I took a look at the source for crates.io briefly last night and it looks like the search controller uses the PostgreSQL ts_rank_cd text search function for the default search. I'm not familiar enough with the Cover Density Ranking algorithm to explain why or whether this produces the results above, but that might be a starting point in digging deeper into this.

Relevance seems like a tricky term here. The default search probably does produce the most relevant package from a text similarity standpoint, but not necessarily to me as a programmer looking for a git library to use. Maybe a hybrid approach that considers text relevance, all-time downloads, and recent downloads would produce something closer to what I expected.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions