Skip to content

Commit 6822fe1

Browse files
committed
merge with main
2 parents cf23602 + 627b198 commit 6822fe1

File tree

72 files changed

+2432
-1565
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+2432
-1565
lines changed

.github/workflows/benchmark-labeled.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@ on: # TODO: delete this file when specifying a label is supported
99
- '*noci'
1010

1111
jobs:
12-
Invoker:
13-
if: github.event.label.name == 'benchmark'
12+
bm_invoker:
1413
uses: ./.github/workflows/benchmark.yml
1514
secrets: inherit

.github/workflows/benchmark.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
name: benchmark
22

3-
on: workflow_call # TODO: add pull_request-labeled with "benchmark" label when specifying a label is supported
3+
# TODO: add pull_request-labeled with "benchmark" label when specifying a label is supported
4+
on: workflow_call
45

56
jobs:
67
start-runner:
@@ -24,7 +25,7 @@ jobs:
2425
github-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
2526
# Ubuntu 22.04 128GB Storage AMI
2627
ec2-image-id: ami-0ba430d4b7b64de57
27-
ec2-instance-type: r6i.large
28+
ec2-instance-type: r6i.xlarge
2829
subnet-id: ${{ secrets.AWS_EC2_SUBNET_ID }}
2930
security-group-id: ${{ secrets.AWS_EC2_SG_ID }}
3031

@@ -64,15 +65,14 @@ jobs:
6465
# TODO: remove "--no-check-certificate" when possible
6566
- name: Download pre-generated indices
6667
timeout-minutes: 20
67-
run: wget --no-check-certificate -q -i tests/benchmark/data/hnsw_indices.txt -P tests/benchmark/data
68-
68+
run: ./tests/benchmark/bm_files.sh ${{ github.event.label.name }}
6969
- name: Benchmark
7070
timeout-minutes: 120
71-
run: make benchmark
71+
run: make benchmark BM_FILTER=${{ github.event.label.name }}
7272

7373
- name: Collect results
7474
run: |
75-
./tests/benchmark/benchmarks.sh | xargs -I {} redisbench-admin export \
75+
./tests/benchmark/benchmarks.sh ${{ github.event.label.name }} | xargs -I {} redisbench-admin export \
7676
--redistimeseries_host ${{ secrets.PERFORMANCE_RTS_HOST }} \
7777
--redistimeseries_port ${{ secrets.PERFORMANCE_RTS_PORT }} \
7878
--redistimeseries_pass ${{ secrets.PERFORMANCE_RTS_AUTH }} \

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
# Ignore benchmark fetched data but not the source file
1414
/tests/benchmark/data/*
15-
!/tests/benchmark/data/hnsw_indices.txt
15+
!/tests/benchmark/data/hnsw_indices
1616

1717
# Prerequisites
1818
*.d

Makefile

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -215,9 +215,8 @@ benchmark:
215215
$(SHOW)mkdir -p $(BINDIR)
216216
$(SHOW)cd $(BINDIR) && cmake $(CMAKE_FLAGS) $(CMAKE_DIR)
217217
@make --no-print-directory -C $(BINDIR) $(MAKE_J)
218-
$(ROOT)/tests/benchmark/benchmarks.sh | xargs -I {} bash -lc \
218+
$(ROOT)/tests/benchmark/benchmarks.sh $(BM_FILTER) | xargs -I {} bash -lc \
219219
"$(BENCHMARKDIR)/bm_{} --benchmark_out_format=json --benchmark_out={}_results.json || exit 255"
220-
$(SHOW)python3 -m tox -e benchmark
221220

222221
toxenv:
223222
ifeq ($(wildcard .tox),)

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,4 +92,5 @@ tox -e flowenv
9292

9393
# Benchmark
9494

95-
To benchmark the capabilities of this library, follow the instructions in the [benchmark docs section](docs/benchmarks.md).
95+
To benchmark the capabilities of this library, follow the instructions in the [benchmarks user guide](docs/benchmarks.md).
96+
If you'd like to create your own benchmarks, you can find more information in the [developer guide](docs/benchmarks_developer.md).

docs/Benchmarks_classes.png

53.7 KB
Loading

docs/benchmarks.md

Lines changed: 143 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -1,54 +1,142 @@
1-
# Benchmark
2-
3-
To get an early sense of what VectorSimilarity library by RedisAI can do, you can test it with the following benchmark tools:
4-
5-
## Google benchmark
6-
7-
Google benchmark is a popular tool for benchmark code snippets, similar to unit tests. It allows a quick way of estimating the runtime of each test based on several (identical) runs, and print the results as output. For some tests, the output includes additional "Recall" metric, which indicates the accuracy in case of approximate search.
8-
9-
There are 2 tests suits available: `BM_VecSimBasics`, and `BM_BatchIterator`.
10-
To run both test suits, call the following commands from the project root dir:
11-
```asm
12-
make
1+
# Vector Similarity benchmarks user guide
2+
3+
## Table of contents
4+
* [Overview](#overview)
5+
* [Run benchmarks](#run-benchmarks)
6+
* [Available sets](#available-sets)
7+
- [BM_VecSimBasics](#bm_vecsimbasics)
8+
- [BM_BatchIterator](#bm_batchiterator)
9+
- [BM_VecSimUpdatedIndex](#bm_vecsimupdatedindex)
10+
* [ann-benchmarks](#ann-benchmark)
11+
12+
# Overview
13+
We use the [Google benchmark](https://github.com/google/benchmark) tool to run micro-benchmarks for the vector indexes functionality.
14+
Google benchmark is a popular tool for benchmark code snippets, similar to unit tests. It allows a quick way to estimate the runtime of each use case based on several (identical) runs, and prints the results as output. For some tests, the output includes an additional "Recall" metric, which indicates the accuracy in the case of approximate search.
15+
**The recall** is calculated as the size of the intersection set between the number of the ground truth results (calculated by the flat algorithm) and the returned results from the approximate algorithm (HNSW in this case), divided by the number of ground truth results:
16+
$$ recall = \frac{\text{approximate algorithm results } \cap
17+
\text{ ground truth results } } {\text{ground truth results}}
18+
$$
19+
# Run benchmarks
20+
## Required files
21+
The serialized indices files that are used for micro-benchmarking and running ann-benchmark can be found in
22+
`tests/benchmark/data/hnsw_indices.txt`.
23+
To download all the required files, run from the repository root directory:
24+
```sh
25+
wget -q -i tests/benchmark/data/hnsw_indices_all/hnsw_indices_all.txt -P tests/benchmark/data
26+
```
27+
To run all test sets, call the following commands from the project root dir:
28+
```sh
1329
make benchmark
1430
```
31+
### Running a Subset of Benchmarks
32+
To run only a subset of benchmarks that match a specified `<regex>`, set `BENCHMARK_FILTER=<regex>` environment variable. For example:
33+
```sh
34+
make benchmark BENCHMARK_FILTER=fp32*
35+
```
1536

16-
#### Basic benchmark
17-
18-
In this test suit, we create two indices that contains (the same) 1M random vectors of 100 floats - first is a "Flat" index, and the other is based on HNSW algorithm, with `L2` as the distance metric. We use `M = 50` and `ef_construction = 350` as build parameters for HNSW index.
19-
In every test case we first generate a random vector, and then perform a simple use case. The test cases are the following:
20-
21-
1. Add a random new vector to the HNSW index
22-
2. Delete a vector from the HNSW index
23-
3. Run `Top_K` query over the flat index (using brute-force search), for `k=10`
24-
4. Run `Top_K` query over the flat index (using brute-force search), for `k=100`
25-
5. Run `Top_K` query over the flat index (using brute-force search), for `k=500`
26-
6. Run `Top_K` query over the HNSW index, for `k=10`, using `ef_runtime=500`
27-
7. Run `Top_K` query over the HNSW index, for `k=100`, using `ef_runtime=500`
28-
8. Run `Top_K` query over the HNSW index, for `k=500`, using `ef_runtime=500`
29-
30-
#### Batch iterator benchmark
31-
32-
The purpose of this test suit is to benchmark the batched search feature. The batch iterator is a handle which enables running `Top_K` query in batches, by asking for the next best `n` results repeatedly, until there are no more results to return. We use for this test suit the same indices that were built for the "basic benchmark". The test cases are:
33-
34-
1. Run `Top_K` query over the flat index in batches of 10 results, until 1000 results are obtained.
35-
2. Run `Top_K` query over the flat index in batches of 100 results, until 1000 results are obtained.
36-
3. Run `Top_K` query over the flat index in batches of 100 results, until 10000 results are obtained.
37-
4. Run `Top_K` query over the flat index in batches of 1000 results, until 10000 results are obtained.
38-
5. Run `Top_K` query over the HNSW index in batches of 10 results, until 1000 results are obtained.
39-
6. Run `Top_K` query over the HNSW index in batches of 100 results, until 1000 results are obtained.
40-
7. Run `Top_K` query over the HNSW index in batches of 100 results, until 10000 results are obtained, using `ef_runtime=500` in every batch.
41-
8. Run `Top_K` query over the HNSW index in batches of 1000 results, until 10000 results are obtained, using `ef_runtime=500` in every batch.
42-
9. Run regular `Top_K` query over the flat index for `k=1000`.
43-
10. Run regular `Top_K` query over the HNSW index for `k=1000`, using `ef_runtime=500`.
44-
11. Run regular `Top_K` query over the flat index for `k=10000`.
45-
12. Run regular `Top_K` query over the HNSW index for `k=10000`, using `ef_runtime=500`.
46-
47-
48-
## ANN-Benchmark
49-
50-
[ANN-Benchmarks](http://ann-benchmarks.com/) is a benchmarking environment for approximate nearest neighbor algorithms search (for additional information, refer to the project's [github repository](https://github.com/erikbern/ann-benchmarks)). Each algorithm is benchmarked on pre-generated commonly use datasets (in HDF5 formats).
51-
The `bm_dataset.py` script uses some of ANN-Benchmark datasets to measure this library performance in the same manner. The following datasets are downloaded and benchmarked:
37+
# Available sets
38+
There are currently 3 sets of benchmarks available: `BM_VecSimBasics`, `BM_BatchIterator`, and `BM_VecSimUpdatedIndex`. Each is templated according to the index data type. We run 10 iterations of each test case, unless otherwise specified.
39+
## BM_VecSimBasics
40+
For each combination of data type (fp32/fp64) and index type (single/multi) the following test cases are included:
41+
1. Mesure index total `memory` (runtime and iterations number are irrelevant for this use case, just the memory metric)
42+
2. `AddLabel` - runs for `DEFAULT_BLOCK_SIZE (= 1024)` iterations, in each we add one new label to the index from the `queries` list. Note that for a single value index each label contains one vector, meaning that the number of new labels equals the number of new vectors.
43+
**results:** average time per label, average memory addition per vector, vectors per label.
44+
*At the end of the benchmark, we delete the added labels*
45+
3. `DeleteLabel` - runs for `DEFAULT_BLOCK_SIZE (= 1024)` iterations, in each we delete one label from the index. Note that for a single value index each label contains one vector, meaning that the number of deleted labels equals the number of deleted vectors.
46+
**results:** average time per label, average memory addition per vector (a negative value means that the memory has decreased).
47+
*At the end of the benchmark, we restore the deleted vectors under the same labels*
48+
#### **TopK benchmarks**
49+
Search for the `k` nearest neighbors of the query.
50+
4. Run `Top_K` query over the flat index (using brute-force search), for each `k=10`, `k=100` and `k=500`
51+
**results:** average time per iteration
52+
5. Run `Top_K` query over the HNSW index, for each pair of `{ef_runtime, k}` from the following:
53+
`{ef_runtime=10, k=10}`
54+
`{ef_runtime=200, k=10}`
55+
`{ef_runtime=100, k=100}`
56+
`{ef_runtime=200, k=100}`
57+
`{ef_runtime=500, k=500}`
58+
**results:** average time per iteration, recall
59+
#### **Range query benchmarks**
60+
In range query, we search for all the vectors in the index whose distance to the query vector is lower than the range.
61+
6. Run `Range` query over the flat index (using brute-force search), for each `radius=0.2`, `radius=0.35` and `radius=0.5`
62+
**results:** average time per iteration, average results number per iteration
63+
7. Run `Range` query over the HNSW index, for each pair of `{radius, epsilon}` from the following:
64+
`{radius=0.2, epsilon=0.001}`
65+
`{radius=0.2, epsilon=0.01}`
66+
`{radius=0.2, epsilon=0.1}`
67+
`{radius=0.35, epsilon=0.001}`
68+
`{radius=0.35, epsilon=0.01}`
69+
`{radius=0.35, epsilon=0.1}`
70+
`{radius=0.5, epsilon=0.001}`
71+
`{radius=0.5, epsilon=0.01}`
72+
`{radius=0.5, epsilon=0.1}`
73+
**results:** average time per iteration, average results number per iteration, recall
74+
75+
## BM_BatchIterator
76+
The purpose of these tests is to benchmark batch iterator functionality. The batch iterator is a handle that enables running `Top_K` query in batches, by asking for the next best `n` results repeatedly, until there are no more results to return. We use for this test cases the same indices that were built for the "basic benchmark" for this test case as well.
77+
The test cases are:
78+
1. Fixed batch size - Run `Top_K` query for each pair of `{batch size, number of batches}` from the following:
79+
`{batch size=10, number of batches=1}`
80+
`{batch size=10, number of batches=3}`
81+
`{batch size=10, number of batches=5}`
82+
`{batch size=100, number of batches=1}`
83+
`{batch size=100, number of batches=3}`
84+
`{batch size=100, number of batches=5}`
85+
`{batch size=1000, number of batches=1}`
86+
`{batch size=1000, number of batches=3}`
87+
`{batch size=1000, number of batches=5}`
88+
**Flat index results:** Time per iteration, memory delta per iteration
89+
**HNSW index results:** Time per iteration, Recall, memory delta per iteration
90+
2. Variable batch size - Run `Top_K` query where in each iteration the batch size is increased by a factor of 2, for each pair of `{batch initial size, number of batches}` from the following:
91+
`{batch initial size=10, number of batches=2}`
92+
`{batch initial size=10, number of batches=4}`
93+
`{batch initial size=100, number of batches=2}`
94+
`{batch initial size=100, number of batches=4}`
95+
`{batch initial size=1000, number of batches=2}`
96+
`{batch initial size=1000, number of batches=4}`
97+
**Flat index results:** Time per iteration
98+
**HNSW index results:** Time per iteration, Recall, memory delta per iteration
99+
3. Batches to Adhoc BF - In each iteration we run `Top_K` with an increasing `batch size` (initial size=10, increase factor=2) for `number of batches` and then switch to ad-hoc BF. We define `step` as the ratio between the index size to the number of vectors to go over in ad-hoc BF. The tests run for each pair of `{step, number of batches}` from the following:
100+
`{step=5, number of batches=0}`
101+
`{step=5, number of batches=2}`
102+
`{step=5, number of batches=5}`
103+
`{step=10, number of batches=0}`
104+
`{step=10, number of batches=2}`
105+
`{step=10, number of batches=5}`
106+
`{step=20, number of batches=0}`
107+
`{step=20, number of batches=2}`
108+
`{step=20, number of batches=5}`
109+
**Flat index results:** Time per iteration
110+
**HNSW index results:** Time per iteration, memory delta per iteration
111+
112+
## BM_VecSimUpdatedIndex
113+
For this use case, we create two indices for each algorithm (flat and HNSW). The first index contains 500K vectors added to an empty index. The other index also contains 500K vectors, but this time they were added by overriding the 500K vectors of the aforementioned indices. Currently, we only test this for FP32 single-value index.
114+
The test cases are:
115+
1. Index `total memory` **before** updating
116+
2. Index `total memory` **after** updating
117+
3. Run `Top_K` query over the flat index **before** updating (using brute-force search), for each `k=10`, `k=100` and `k=500`
118+
**results:** average time per iteration
119+
4. Run `Top_K` query over the flat index **after** updating (using brute-force search), for each `k=10`, `k=100` and `k=500`
120+
**results:** average time per iteration
121+
5. Run **100** iterations of `Top_K` query over the HNSW index **before** updating, for each pair of `{ef_runtime, k}` from the following:
122+
`{ef_runtime=10, k=10}`
123+
`{ef_runtime=200, k=10}`
124+
`{ef_runtime=100, k=100}`
125+
`{ef_runtime=200, k=100}`
126+
`{ef_runtime=500, k=500}`
127+
**results:** average time per iteration, recall
128+
6. Run **100** iterations of `Top_K` query over the HNSW index **after** updating, for each pair of `{ef_runtime, k}` from the following:
129+
`{ef_runtime=10, k=10}`
130+
`{ef_runtime=200, k=10}`
131+
`{ef_runtime=100, k=100}`
132+
`{ef_runtime=200, k=100}`
133+
`{ef_runtime=500, k=500}`
134+
**results:** average time per iteration, recall
135+
136+
# ANN-Benchmark
137+
138+
[ANN-Benchmarks](http://ann-benchmarks.com/) is a benchmarking environment for approximate nearest neighbor algorithms search (for additional information, refer to the project's [GitHub repository](https://github.com/erikbern/ann-benchmarks)). Each algorithm is benchmarked on pre-generated commonly use datasets (in HDF5 formats).
139+
The `bm_dataset.py` script uses some of ANN-Benchmark datasets to measure this library performance in the same manner. The following datasets are downloaded and benchmarked (all use FP32 single value per label data):
52140

53141
1. glove-25-angular
54142
2. glove-50-angular
@@ -57,13 +145,17 @@ The `bm_dataset.py` script uses some of ANN-Benchmark datasets to measure this l
57145
5. mnist-784-euclidean
58146
6. sift-128-euclidean
59147

60-
For each dataset, the script will build an HNSW index with pre-defined build parameters, and persist it to a local file in `./data` directory that will be generated (index file name for example: `glove-25-angular-M=16-ef=100.hnsw`). Note that if the file already exists in this path, the entire index will be loaded instead of rebuilding it. Then, for 3 different pre-defined `ef_runtime` values, 1000 `Top_K` queries will be executed for `k=10` (these parameters can be modified easily in the script). For every configuration, the script outputs the following statistics:
148+
For each dataset, the script will build an HNSW index with pre-defined build parameters and persist it to a local file in `./data` directory that will be generated (index file name for example: `glove-25-angular-M=16-ef=100.hnsw`). Note that if the file already exists in this path, the entire index will be loaded instead of rebuilding it.
149+
To download the serialized indices run from the project's root directory:
150+
```sh
151+
wget -q -i tests/benchmark/data/hnsw_indices_all/hnsw_indices_ann.txt -P tests/benchmark/data
152+
```
153+
Then, for 3 different pre-defined `ef_runtime` values, 1000 `Top_K` queries will be executed for `k=10` (these parameters can be modified easily in the script). For every configuration, the script outputs the following statistics:
61154

62155
- Average recall
63156
- Query-per-second when running in brute-force mode
64157
- Query-per-second when running with HNSW index
65-
66158
To reproduce this benchmark, first install the project's python bindings, and then invoke the script. From the project's root directory, you should run:
67159
```py
68160
python3 tests/benchmark/bm_datasets.py
69-
```
161+
```

0 commit comments

Comments
 (0)