feat: improve the random access file benchmark by westonpace · Pull Request #5628 · lance-format/lance

westonpace · 2026-01-05T14:20:54Z

I've been using this benchmark for a number of potential reader improvements and made it more representative of real world use cases by adding multiple threads, more depth per thread, and a no-cache mode to force the data to be read from disk.

github-actions · 2026-01-05T14:22:10Z

Code Review

P0: Potential Bug - Compilation Failure on Non-Linux Platforms

The drop_file_from_cache function is only defined for #[cfg(target_os = "linux")], but it's called unconditionally in the benchmark loop (line ~376 in diff). This will cause a compilation error on macOS and Windows.

Suggested fix: Either:

Add a no-op stub for non-linux platforms
Or conditionally skip the no-cache benchmarks on non-linux with a cfg attribute

P1: Data Size vs Index Calculation Mismatch

The benchmark generates 500M rows (500 * 1024 * 1024), but creates indices for only 100K rows with a gap of 5K between them. The maximum index would be 99,999 * 5,120 = 511,994,880, which is close to the data size of 524,288,000. This is fine, but the 500MB+ file generation per benchmark configuration seems excessive for CI environments. Consider documenting this or adding a way to run with smaller data for quick local testing.

Overall, the benchmark improvements look well-structured and useful for real-world performance testing. Just the platform compatibility issue needs to be addressed before merge.

codecov · 2026-01-05T15:01:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

westonpace · 2026-01-16T14:38:03Z

I've addressed P0 from claudes review. I think P1 is fine. 500MB is needed to get 100,000 spaced out rows. We need that many to get a good sense of the timing.

I've been using this benchmark for a number of potential reader improvements and made it more representative of real world use cases by adding multiple threads, more depth per thread, and a no-cache mode to force the data to be read from disk.

github-actions Bot added the enhancement New feature or request label Jan 5, 2026

westonpace added 2 commits January 16, 2026 06:33

Improve the random access file benchmark

962268b

Add a no-op path for non-linux systems on the drop cache function

493615e

westonpace force-pushed the perf/improve-random-access-file-benc branch from 5bcbab8 to 493615e Compare January 16, 2026 14:36

westonpace added 2 commits January 16, 2026 06:38

use correct syntax

e3b91f4

Address clippy suggestions

57eca50

wjones127 approved these changes Jan 16, 2026

View reviewed changes

westonpace merged commit 92f3808 into lance-format:main Jan 16, 2026
28 of 29 checks passed

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve the random access file benchmark#5628

feat: improve the random access file benchmark#5628
westonpace merged 4 commits intolance-format:mainfrom
westonpace:perf/improve-random-access-file-benc

westonpace commented Jan 5, 2026

Uh oh!

github-actions Bot commented Jan 5, 2026

Uh oh!

codecov Bot commented Jan 5, 2026

Uh oh!

westonpace commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

westonpace commented Jan 5, 2026

Uh oh!

github-actions Bot commented Jan 5, 2026

Code Review

P0: Potential Bug - Compilation Failure on Non-Linux Platforms

P1: Data Size vs Index Calculation Mismatch

Uh oh!

codecov Bot commented Jan 5, 2026

Codecov Report

Uh oh!

westonpace commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants