Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v2.2.0

Add:

- Added `BasicDiskVectorDatabase` to provide a basic automatically disk persistent vector database, and associated `BasicDiskVectorStore` and `BasicDiskVocabularyStore`.

## v2.1.3

Add:
Expand Down
31 changes: 31 additions & 0 deletions docs/docs/persistence/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ title: Data Persistence

The `Build5Nines.SharpVector` library provides easy-to-use methods for saving a memory-based vector database to a file or stream and loading it again later. This is particularly useful for caching indexed content between runs, deploying pre-built vector stores, or shipping databases with your application.

---

## :material-file: File Persistence

`Build5Nines.SharpVector` supports persisting the vector database to a file.
Expand Down Expand Up @@ -51,6 +53,8 @@ vdb.LoadFromFile(filePath);
await vdb.LoadFromFileAsync(filePath);
```

---

## :material-file-move: Persist to Stream

The underlying methods used by `SaveToFile` and `LoadFromFile` methods for serializing the vector database to a `Stream` are available to use directly. This provides support for reading/writing to `MemoryStream` (or other streams) if the vector database needs to be persisted to something other than the local file system.
Expand Down Expand Up @@ -92,3 +96,30 @@ vdb.DeserializeFromBinaryStream(stream);
// deserialize asynchronously from JSON stream
await vdb.DeserializeFromBinaryStreamAsync(stream);
```

---

## :material-file-database: BasicDiskVectorDatabase

The `BasicDiskVectorDatabase` provides a basic vector database implementation that automatically stores the vector store and vocabulary store to disk. It's implmentation of vectorization is the same as the `BasicMemoryVectorDatabase`, but with the modification that it automatically persists the database to disk in the background to the specified folder path.

Here's a basic example of using `BasicDiskVectorDatabase`:

```csharp
// specify the folder where to persist the database data on disk
var vdb = new BasicDiskVectorDatabase("C:/data/content-db");
foreach (var doc in documents)
{
vdb.AddText(doc.Id, doc.Text);
}

var results = vdb.Search("some text");

```

### Tips

- Prefer absolute paths for the storage folder in production services.
- Place the folder on fast storage (SSD) for best indexing/query performance.
- Avoid sharing the same folder across multiple processes concurrently.
- Back up the folder regularly to preserve your vector store and vocabulary.
29 changes: 29 additions & 0 deletions src/Build5Nines.SharpVector/BasicDiskMemoryVectorDatabaseBase.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
using Build5Nines.SharpVector.Id;
using Build5Nines.SharpVector.Preprocessing;
using Build5Nines.SharpVector.Vocabulary;
using Build5Nines.SharpVector.Vectorization;
using Build5Nines.SharpVector.VectorCompare;
using Build5Nines.SharpVector.VectorStore;

namespace Build5Nines.SharpVector;

/// <summary>
/// Base class for an on-disk vector database. Mirrors MemoryVectorDatabaseBase generic composition
/// while using disk-backed stores for persistence.
/// </summary>
public abstract class BasicDiskMemoryVectorDatabaseBase<TId, TMetadata, TVectorStore, TVocabularyStore, TVocabularyKey, TVocabularyValue, TIdGenerator, TTextPreprocessor, TVectorizer, TVectorComparer>
: VectorDatabaseBase<TId, TMetadata, TVectorStore, TVocabularyStore, TVocabularyKey, TVocabularyValue, TIdGenerator, TTextPreprocessor, TVectorizer, TVectorComparer>
where TId : notnull
where TVocabularyKey : notnull
where TVocabularyValue : notnull
where TVectorStore : IVectorStoreWithVocabulary<TId, TMetadata, TVocabularyStore, TVocabularyKey, TVocabularyValue>
where TVocabularyStore : IVocabularyStore<TVocabularyKey, TVocabularyValue>
where TIdGenerator : IIdGenerator<TId>, new()
where TTextPreprocessor : ITextPreprocessor<TVocabularyKey>, new()
where TVectorizer : IVectorizer<TVocabularyKey, TVocabularyValue>, new()
where TVectorComparer : IVectorComparer, new()
{
protected BasicDiskMemoryVectorDatabaseBase(TVectorStore vectorStore)
: base(vectorStore)
{ }
}
47 changes: 47 additions & 0 deletions src/Build5Nines.SharpVector/BasicDiskVectorDatabase.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
using Build5Nines.SharpVector.Vocabulary;
using Build5Nines.SharpVector.Id;
using Build5Nines.SharpVector.Preprocessing;
using Build5Nines.SharpVector.Vectorization;
using Build5Nines.SharpVector.VectorCompare;
using Build5Nines.SharpVector.VectorStore;

namespace Build5Nines.SharpVector;

/// <summary>
/// A basic disk-backed vector database using Bag-of-Words, Cosine similarity,
/// disk-backed vector store and vocabulary store. Uses int IDs and string metadata.
/// </summary>
public class BasicDiskVectorDatabase<TMetadata>
: BasicDiskMemoryVectorDatabaseBase<
int,
TMetadata,
BasicDiskVectorStore<int, TMetadata, BasicDiskVocabularyStore<string>, string, int>,
BasicDiskVocabularyStore<string>,
string, int,
IntIdGenerator,
BasicTextPreprocessor,
BagOfWordsVectorizer<string, int>,
CosineSimilarityVectorComparer
>, IMemoryVectorDatabase<int, TMetadata>, IVectorDatabase<int, TMetadata>
{
public BasicDiskVectorDatabase(string rootPath)
: base(
new BasicDiskVectorStore<int, TMetadata, BasicDiskVocabularyStore<string>, string, int>(
rootPath,
new BasicDiskVocabularyStore<string>(rootPath)
)
)
{ }

[Obsolete("Use DeserializeFromBinaryStreamAsync instead.")]
public override async Task DeserializeFromJsonStreamAsync(Stream stream)
{
await DeserializeFromBinaryStreamAsync(stream);
}

[Obsolete("Use DeserializeFromBinaryStream instead.")]
public override void DeserializeFromJsonStream(Stream stream)
{
DeserializeFromBinaryStream(stream);
}
}
2 changes: 1 addition & 1 deletion src/Build5Nines.SharpVector/Build5Nines.SharpVector.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
<PackageId>Build5Nines.SharpVector</PackageId>
<PackageProjectUrl>https://sharpvector.build5nines.com</PackageProjectUrl>
<RepositoryUrl>https://github.com/Build5Nines/SharpVector</RepositoryUrl>
<Version>2.1.3</Version>
<Version>2.2.0</Version>
<Description>Lightweight In-memory Vector Database to embed in any .NET Applications</Description>
<Copyright>Copyright (c) 2025 Build5Nines LLC</Copyright>
<PackageReadmeFile>README.md</PackageReadmeFile>
Expand Down
Loading
Loading