Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
13f8290
Added optimzation for Opensearch
navneet1v Oct 16, 2024
0e04878
record the test time; add version / note info for milvus and zillizcl…
alwayslove2013 Oct 18, 2024
ef5b4fc
fix bug: date to datetime
alwayslove2013 Oct 23, 2024
82ffe00
update leaderboard data
alwayslove2013 Oct 24, 2024
422303a
fix leaderboard data: zillizcloud version
alwayslove2013 Oct 25, 2024
2efc6b5
Fixed custom_case key error in parameters dict in CLI command.
Sheharyar570 Oct 25, 2024
c7d4a7e
Refactored command options for consistency.
Sheharyar570 Oct 25, 2024
aa48c48
Updated readme, added custom case related command options information.
Sheharyar570 Oct 25, 2024
51390b7
update the instruction for adding custom_case support in new CLI impl…
Sheharyar570 Oct 25, 2024
f18eaec
add key for plotly_chart
alwayslove2013 Oct 28, 2024
aa6d4dc
add key for plotly_chart
alwayslove2013 Oct 28, 2024
c66dfb5
fix pinecone client
alwayslove2013 Oct 28, 2024
cfaa1c6
Support for pgdiskann client (#388)
wahajali Oct 29, 2024
369f3c6
increase timeout
alwayslove2013 Oct 29, 2024
d11330d
Binary Quantization Support for pgvector HNSW Algorithm (#389)
Sheharyar570 Oct 29, 2024
f4a669b
Diskann custom dataset run script added
Sheharyar570 Nov 5, 2024
fd27212
Updated config files to include diskann related params
Sheharyar570 Nov 5, 2024
8ee569f
set enable_seqscan off in all configs
Sheharyar570 Nov 5, 2024
d3b0e7d
log level set to DEBUG
Sheharyar570 Nov 5, 2024
10c8cfa
fixed provider and instance_type info in one of the config files.
Sheharyar570 Nov 5, 2024
fe7561b
Merge branch 'main' of https://github.com/EmumbaOrg/VectorDBBench int…
Sheharyar570 Nov 5, 2024
b738358
diskann cli custom_case support added
Sheharyar570 Nov 5, 2024
2e27bf0
increase optimize timeout for custom case to 2 days.
Sheharyar570 Nov 6, 2024
1b06ecc
increase optimize timeout for custom case to 7 days.
Sheharyar570 Nov 6, 2024
531dde2
Fixed diskann custom dataset run script
Sheharyar570 Nov 6, 2024
c1090df
resolved get_size_info not defined issue.
Sheharyar570 Nov 7, 2024
6c38308
removed parallel index build related queries from init
Sheharyar570 Nov 7, 2024
2bd7bd0
serial search enabled for all run configs.
Sheharyar570 Nov 7, 2024
1750233
fix serial and conc search always true even when false in config
Sheharyar570 Nov 8, 2024
a10336b
fxed db-label in config
Sheharyar570 Nov 8, 2024
db2b499
delete extra config directory
Sheharyar570 Nov 8, 2024
b9554a4
Update test.parquet file
wahajali Nov 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,29 @@ Options:
--m INTEGER hnsw m
--ef-construction INTEGER hnsw ef-construction
--ef-search INTEGER hnsw ef-search
--quantization-type [none|halfvec]
--quantization-type [none|bit|halfvec]
quantization type for vectors
--custom-case-name TEXT Custom case name i.e. PerformanceCase1536D50K
--custom-case-description TEXT Custom name description
--custom-case-load-timeout INTEGER
Custom case load timeout [default: 36000]
--custom-case-optimize-timeout INTEGER
Custom case optimize timeout [default: 36000]
--custom-dataset-name TEXT
Dataset name i.e OpenAI
--custom-dataset-dir TEXT Dataset directory i.e. openai_medium_500k
--custom-dataset-size INTEGER Dataset size i.e. 500000
--custom-dataset-dim INTEGER Dataset dimension
--custom-dataset-metric-type TEXT
Dataset distance metric [default: COSINE]
--custom-dataset-file-count INTEGER
Dataset file count
--custom-dataset-use-shuffled / --skip-custom-dataset-use-shuffled
Use shuffled custom dataset or skip [default: custom-dataset-
use-shuffled]
--custom-dataset-with-gt / --skip-custom-dataset-with-gt
Custom dataset with ground truth or skip [default: custom-dataset-
with-gt]
--help Show this message and exit.
```
#### Using a configuration file.
Expand Down Expand Up @@ -464,6 +485,8 @@ def ZillizAutoIndex(**parameters: Unpack[ZillizTypedDict]):
3. Update db_config and db_case_config to match client requirements
4. Continue to add new functions for each index config.
5. Import the client cli module and command to vectordb_bench/cli/vectordbbench.py (for databases with multiple commands (index configs), this only needs to be done for one command)
6. Import the `get_custom_case_config` function from `vectordb_bench/cli/cli.py` and use it to add a new key `custom_case` to the `parameters` variable within the command.


> cli modules with multiple index configs:
> - pgvector: vectordb_bench/backend/clients/pgvector/cli.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-1000k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-1000k",
"db-label": "diskann-memory-comparison-1000k",
"drop_old": true,
"load": true,
"search-serial": false,
"search-concurrent": false,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-1500k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-1500k",
"db-label": "diskann-memory-comparison-1500k",
"drop_old": true,
"load": true,
"search-serial": false,
"search-concurrent": false,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-2500k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-2500k",
"db-label": "diskann-memory-comparison-2500k",
"drop_old": true,
"load": true,
"search-serial": false,
"search-concurrent": false,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-3500k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-3500k",
"db-label": "diskann-memory-comparison-3500k",
"drop_old": true,
"load": true,
"search-serial": false,
"search-concurrent": false,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-4000k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-4000k",
"db-label": "diskann-memory-comparison-4000k",
"drop_old": true,
"load": true,
"search-serial": false,
"search-concurrent": false,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-2000k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-2000k",
"db-label": "diskann-memory-comparison-2000k",
"drop_old": true,
"load": true,
"search-serial": false,
"search-concurrent": false,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-3000k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-3000k",
"db-label": "diskann-memory-comparison-3000k",
"drop_old": true,
"load": true,
"search-serial": false,
"search-concurrent": false,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-4500k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-4500k",
"db-label": "diskann-memory-comparison-4500k",
"drop_old": true,
"load": true,
"search-serial": false,
"search-concurrent": false,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-5000k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-5000k",
"db-label": "diskann-memory-comparison-5000k",
"drop_old": true,
"load": true,
"search-serial": false,
"search-concurrent": false,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-500k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-500k",
"db-label": "diskann-memory-comparison-500k",
"drop_old": true,
"load": true,
"search-serial": false,
"search-concurrent": false,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
12 changes: 6 additions & 6 deletions custom-run-configs-1/config-custom-dataset-small-hnsw-1000k.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-1000k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-1000k",
"db-label": "diskann-memory-comparison-1000k",
"drop_old": false,
"load": false,
"search-serial": false,
"search-serial": true,
"search-concurrent": true,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
12 changes: 6 additions & 6 deletions custom-run-configs-1/config-custom-dataset-small-hnsw-1500k.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,21 @@
"db_name": "ann-1500k",
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "on"
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-1500k",
"db-label": "diskann-memory-comparison-1500k",
"drop_old": false,
"load": false,
"search-serial": false,
"search-serial": true,
"search-concurrent": true,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
16 changes: 8 additions & 8 deletions custom-run-configs-1/config-custom-dataset-small-hnsw-2500k.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,23 @@
"username": "postgres",
"password": "postgres",
"db_name": "ann-2500k",
"instance_type": "db.m6i.large",
"provider": "aws",
"enable_seqscan": "on"
"instance_type": "Standard_D8ds_v5",
"provider": "azure",
"enable_seqscan": "off"
},
"cases": [
{
"db-label": "memory-comparison-2500k",
"db-label": "diskann-memory-comparison-2500k",
"drop_old": false,
"load": false,
"search-serial": false,
"search-serial": true,
"search-concurrent": true,
"case-type": "PerformanceCustomDataset",
"maintenance-work-mem": "16GB",
"max-parallel-workers": 7,
"ef-search": [40],
"ef-construction": 128,
"m": 32,
"max-neighbors": 64,
"l-value-ib": 128,
"l-value-is": [32],
"num-concurrency": "1,10,20,30,40,50,60,70,80,90,100",
"concurrency-duration": 30,
"k": 10,
Expand Down
Loading