Skip to content

ci: add memory and io benchmarks for building indices#5483

Merged
wjones127 merged 5 commits intolance-format:mainfrom
wjones127:io_bench_index
Dec 17, 2025
Merged

ci: add memory and io benchmarks for building indices#5483
wjones127 merged 5 commits intolance-format:mainfrom
wjones127:io_bench_index

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 commented Dec 16, 2025

Adds basic memory and IO benchmarks for building indices. These are used for regression tests.

@github-actions github-actions Bot added python ci Github Action or Test issues labels Dec 16, 2025
@wjones127
Copy link
Copy Markdown
Contributor Author

Some kind of interesting results so far:

============================================== IO/Memory Benchmark Statistics ===================================================
Test                                                  Peak Mem         Allocs   Read IOPS    Read Bytes  Write IOPS   Write Bytes
---------------------------------------------------------------------------------------------------------------------------------
test_io_mem_build_ivf_pq                                1.9 GB    118,412,090      73,845        1.3 GB           4        4.1 MB
test_io_mem_build_scalar_index[btree-string]          204.6 MB        357,255          34      125.0 MB          39      175.1 MB
test_io_mem_build_scalar_index[bitmap-string]           3.6 GB     52,613,566          34      125.0 MB          75      362.5 MB
test_io_mem_build_scalar_index[zonemap-string]        177.0 MB        195,567          34      125.0 MB           3       58.6 KB
test_io_mem_build_scalar_index[bloomfilter-string]    178.6 MB        191,647          34      125.0 MB           8       25.0 MB
test_io_mem_build_fts[True]                             2.3 GB  1,171,997,306          34      125.0 MB         158      921.3 MB
test_io_mem_build_fts[False]                            2.3 GB  1,175,501,156          34      125.0 MB         158      921.3 MB
test_io_mem_build_scalar_index[btree-int64]           155.9 MB        588,437          27      100.1 MB          44      200.1 MB
test_io_mem_build_scalar_index[bitmap-int64]            7.1 GB     78,866,827          27      100.1 MB         111      575.0 MB
test_io_mem_build_scalar_index[zonemap-int64]         101.5 MB        234,341          27      100.1 MB           3       77.8 KB
test_io_mem_build_scalar_index[bloomfilter-int64]     164.2 MB        240,810          27      100.1 MB          13       50.1 MB

This probably needs some work to get helpful input data. The BITMAP index data needs lower cardinality to be representative and the FTS index data needs to have actual common keywords.

@wjones127
Copy link
Copy Markdown
Contributor Author

New results:

=========================================== IO/Memory Benchmark Statistics ===================================================
Test                                                  Peak Mem      Allocs   Read IOPS    Read Bytes  Write IOPS   Write Bytes
------------------------------------------------------------------------------------------------------------------------------
test_io_mem_build_ivf_pq                                1.9 GB     118.4 M      73,844        1.3 GB           4        4.1 MB
test_io_mem_build_fts[True]                             1.1 GB     443.0 M          43      167.0 MB         101      478.3 MB
test_io_mem_build_fts[False]                         1015.0 MB     430.7 M          43      167.0 MB         101      478.3 MB
test_io_mem_build_scalar_index[btree-string]          196.6 MB     357.3 K          34      125.0 MB          39      175.1 MB
test_io_mem_build_scalar_index[bitmap-string]         178.8 MB      13.9 M          34      125.0 MB           5       13.4 MB
test_io_mem_build_scalar_index[zonemap-string]        177.0 MB     195.6 K          34      125.0 MB           3       58.6 KB
test_io_mem_build_scalar_index[bloomfilter-string]    179.8 MB     191.6 K          34      125.0 MB           8       25.0 MB
test_io_mem_build_scalar_index[btree-int64]           155.9 MB     588.5 K          27      100.1 MB          44      200.1 MB
test_io_mem_build_scalar_index[bitmap-int64]          104.3 MB       1.4 M          27      100.1 MB           8       26.7 MB
test_io_mem_build_scalar_index[zonemap-int64]         100.5 MB     234.3 K          27      100.1 MB           3       77.8 KB
test_io_mem_build_scalar_index[bloomfilter-int64]     164.3 MB     240.8 K          27      100.1 MB          13       50.1 MB

Comment on lines -2934 to +3095
vec![234, 107],
vec![220, 152],
vec![21, 16, 184, 220]
vec![174, 178],
vec![64, 122, 207, 248],
vec![124, 3, 58]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changed because we re-use the same RNG across arrays, and we changed how random_sentence uses the RNG to a different distribution and thus changes the state differently.

@wjones127 wjones127 marked this pull request as ready for review December 16, 2025 20:46
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 28.88889% with 64 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-datagen/src/generator.rs 28.88% 62 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

@wjones127 wjones127 merged commit bd18c03 into lance-format:main Dec 17, 2025
29 of 30 checks passed
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
)

Adds basic memory and IO benchmarks for building indices. These are used
for regression tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Github Action or Test issues python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants