feat: add the repetition index to the miniblock write path by westonpace · Pull Request #3208 · lance-format/lance

westonpace · 2024-12-05T14:44:27Z

The repetition index is what will give us random access support when we have list data. At a high level it stores the number of top-level rows in each mini-block chunk. We can use this later to figure out which chunks we need to read.

In reality things are a little more complicated because we don't mandate that each chunk starts with a brand new row (e.g. a row can span multiple mini-block chunks). This is useful because we eventually want to support arbitrarily deep nested access. If we create not-so-mini blocks in the presence of large lists then we introduce read amplification we'd like to avoid.

codecov-commenter · 2024-12-05T15:02:43Z

Codecov Report

Attention: Patch coverage is 42.99674% with 175 lines in your changes missing coverage. Please review.

Project coverage is 78.44%. Comparing base (5ff966d) to head (071984b).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
rust/lance-encoding/src/encodings/logical/list.rs	10.23%	112 Missing and 2 partials ⚠️
.../lance-encoding/src/encodings/logical/primitive.rs	68.69%	30 Missing and 6 partials ⚠️
rust/lance-encoding/src/encoder.rs	18.18%	8 Missing and 1 partial ⚠️
rust/lance-encoding/src/decoder.rs	0.00%	8 Missing ⚠️
...ust/lance-encoding/src/encodings/logical/struct.rs	72.72%	6 Missing ⚠️
rust/lance-encoding-datafusion/src/zone.rs	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3208      +/-   ##
==========================================
- Coverage   78.55%   78.44%   -0.11%     
==========================================
  Files         244      244              
  Lines       84298    84554     +256     
  Branches    84298    84554     +256     
==========================================
+ Hits        66218    66332     +114     
- Misses      15284    15417     +133     
- Partials     2796     2805       +9

Flag	Coverage Δ
unittests	`78.44% <42.99%> (-0.11%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…er for lists) when using the mini-block structural encoding.

github-actions Bot added the enhancement New feature or request label Dec 5, 2024

broccoliSpicy approved these changes Dec 5, 2024

View reviewed changes

westonpace added 2 commits December 9, 2024 16:24

Adds the encode path for repetition index (and adds a scheduler/encod…

57d011f

…er for lists) when using the mini-block structural encoding.

Address clippy warnings

071984b

westonpace force-pushed the feat/2.1-repindex-miniblock-write branch from 2982346 to 071984b Compare December 10, 2024 00:24

westonpace merged commit 10c31b3 into lance-format:main Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add the repetition index to the miniblock write path#3208

feat: add the repetition index to the miniblock write path#3208
westonpace merged 2 commits intolance-format:mainfrom
westonpace:feat/2.1-repindex-miniblock-write

westonpace commented Dec 5, 2024

Uh oh!

codecov-commenter commented Dec 5, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

westonpace commented Dec 5, 2024

Uh oh!

codecov-commenter commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Dec 5, 2024 •

edited

Loading