Skip to content

feat: improve the random access file benchmark#5628

Merged
westonpace merged 4 commits intolance-format:mainfrom
westonpace:perf/improve-random-access-file-benc
Jan 16, 2026
Merged

feat: improve the random access file benchmark#5628
westonpace merged 4 commits intolance-format:mainfrom
westonpace:perf/improve-random-access-file-benc

Conversation

@westonpace
Copy link
Copy Markdown
Member

I've been using this benchmark for a number of potential reader improvements and made it more representative of real world use cases by adding multiple threads, more depth per thread, and a no-cache mode to force the data to be read from disk.

@github-actions github-actions Bot added the enhancement New feature or request label Jan 5, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 5, 2026

Code Review

P0: Potential Bug - Compilation Failure on Non-Linux Platforms

The drop_file_from_cache function is only defined for #[cfg(target_os = "linux")], but it's called unconditionally in the benchmark loop (line ~376 in diff). This will cause a compilation error on macOS and Windows.

Suggested fix: Either:

  1. Add a no-op stub for non-linux platforms
  2. Or conditionally skip the no-cache benchmarks on non-linux with a cfg attribute

P1: Data Size vs Index Calculation Mismatch

The benchmark generates 500M rows (500 * 1024 * 1024), but creates indices for only 100K rows with a gap of 5K between them. The maximum index would be 99,999 * 5,120 = 511,994,880, which is close to the data size of 524,288,000. This is fine, but the 500MB+ file generation per benchmark configuration seems excessive for CI environments. Consider documenting this or adding a way to run with smaller data for quick local testing.


Overall, the benchmark improvements look well-structured and useful for real-world performance testing. Just the platform compatibility issue needs to be addressed before merge.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@westonpace westonpace force-pushed the perf/improve-random-access-file-benc branch from 5bcbab8 to 493615e Compare January 16, 2026 14:36
@westonpace
Copy link
Copy Markdown
Member Author

I've addressed P0 from claudes review. I think P1 is fine. 500MB is needed to get 100,000 spaced out rows. We need that many to get a good sense of the timing.

@westonpace westonpace merged commit 92f3808 into lance-format:main Jan 16, 2026
28 of 29 checks passed
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
I've been using this benchmark for a number of potential reader
improvements and made it more representative of real world use cases by
adding multiple threads, more depth per thread, and a no-cache mode to
force the data to be read from disk.
vivek-bharathan pushed a commit to vivek-bharathan/lance that referenced this pull request Feb 2, 2026
I've been using this benchmark for a number of potential reader
improvements and made it more representative of real world use cases by
adding multiple threads, more depth per thread, and a no-cache mode to
force the data to be read from disk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants