Skip to content

perf: intern RowDatasetVersionMeta inline bytes to reduce manifest memory#6499

Merged
jackye1995 merged 4 commits intolance-format:mainfrom
beinan:beinan/intern-version-meta
Apr 14, 2026
Merged

perf: intern RowDatasetVersionMeta inline bytes to reduce manifest memory#6499
jackye1995 merged 4 commits intolance-format:mainfrom
beinan:beinan/intern-version-meta

Conversation

@beinan
Copy link
Copy Markdown
Contributor

@beinan beinan commented Apr 13, 2026

Summary

  • Change RowDatasetVersionMeta::Inline from Vec<u8> to Arc<[u8]> so that fragments with identical version metadata share a single heap allocation
  • Extend DataFileFieldInterner to deduplicate these inline byte payloads during manifest deserialization
  • Introduce InternCache<T>: a hybrid cache that uses Vec linear scan for ≤16 entries and upgrades to HashMap for larger caches
  • Add custom Serialize/Deserialize impls for RowDatasetVersionMeta to handle Arc<[u8]> transparently

Motivation

Follow-up to #6477 (interning DataFile.fields/column_indices). After a compaction, all fragments are stamped with the same version metadata (both last_updated_at_version_meta and created_at_version_meta), but each fragment previously owned its own Vec<u8> copy.

Per-fragment memory breakdown (before)

Field Size per fragment
last_updated_at_version_meta: Inline(Vec<u8>) ~24 bytes + payload
created_at_version_meta: Inline(Vec<u8>) ~24 bytes + payload
Total redundant at 20M fragments ~480 MB+

After this change

With interning, all 20M fragments share a single Arc<[u8]> allocation per unique payload.

Benchmark results

Microbenchmark at 100K fragments (10 fields per fragment):

Scenario No interning With interning Delta
Uniform (1 unique version) 24.5 ms 17.9 ms 27% faster
Diverse (10 unique) 25.7 ms 19.7 ms 23% faster
Diverse (100 unique) 26.0 ms 23.4 ms 10% faster
Diverse (500 unique) 26.0 ms 22.8 ms 12% faster
Memory (100K fragments) No interning With interning Savings
10 fields 39.47 MB 29.74 MB 24.6%
50 fields 69.99 MB 29.74 MB 57.5%

Both memory and speed improve across all scenarios. The hybrid InternCache uses fast Vec scan for the common case (1-3 unique values) and upgrades to HashMap when diversity exceeds 16 entries.

Run with: cargo bench -p lance-table --bench manifest_intern

Changes

  • rust/lance-table/src/rowids/version.rsInline(Vec<u8>)Inline(Arc<[u8]>), custom serde impls, updated protobuf conversions
  • rust/lance-table/src/format/fragment.rsInternCache<T> (Vec/HashMap hybrid), extended DataFileFieldInterner with version meta interning
  • rust/lance-table/benches/manifest_intern.rs — Microbenchmark covering uniform and diverse scenarios

Compatibility

  • No format change — protobuf schema is unchanged
  • Serde JSON output is identical (custom impl serializes Arc<[u8]> as [u8])
  • from_sequence() still works as before (converts internally)

Test plan

  • cargo check --workspace --tests passes
  • cargo clippy -p lance-table -p lance -- -D warnings passes
  • All 88 lance-table tests pass
  • cargo fmt --all -- --check passes
  • Microbenchmark validates performance across uniform and diverse scenarios
  • CI

🤖 Generated with Claude Code

@beinan beinan force-pushed the beinan/intern-version-meta branch 2 times, most recently from 80da537 to 6db1001 Compare April 13, 2026 21:10
beinan and others added 3 commits April 13, 2026 21:16
…mory

Change RowDatasetVersionMeta::Inline from Vec<u8> to Arc<[u8]> and
deduplicate identical byte payloads during manifest deserialization
via DataFileFieldInterner.

Post-compaction, all fragments are stamped with the same version
metadata, so at 20M fragments this saves ~480 MB of redundant heap
allocations for each of last_updated_at and created_at version meta.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Measures deserialization throughput and memory usage with and without
interning for DataFile fields, column_indices, and
RowDatasetVersionMeta inline bytes.

At 100K fragments (10 fields):
  Without interning: 39.47 MB, 24.3 ms
  With interning:    29.74 MB, 28.2 ms
  Savings: 9.73 MB (24.6%), +16% deser time

At 100K fragments (50 fields):
  Without interning: 69.99 MB, 30.9 ms
  With interning:    29.74 MB, 35.3 ms
  Savings: 40.24 MB (57.5%), +14% deser time

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace pure Vec cache with InternCache that uses linear scan for
<=16 entries and upgrades to HashMap for larger caches.

This keeps the common case fast (1-3 unique values → no hashing)
while avoiding O(n) scan degradation with many unique values
(e.g., 500 distinct version payloads from many small appends).

At 100K fragments (10 fields):
  Uniform (1 unique):     24.5ms (no intern) → 17.9ms (intern), 27% faster
  Diverse (100 unique):   26.0ms (no intern) → 23.4ms (intern), 10% faster
  Diverse (500 unique):   26.0ms (no intern) → 22.8ms (intern), 12% faster

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@beinan beinan force-pushed the beinan/intern-version-meta branch from 6db1001 to 1b27f53 Compare April 13, 2026 21:16
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 54.76190% with 38 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-table/src/format/fragment.rs 62.12% 23 Missing and 2 partials ⚠️
rust/lance-table/src/rowids/version.rs 27.77% 13 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the feature overall looks good to me, pending CI fix

CI clippy denies print_stderr; benchmarks use eprintln! to report
memory stats alongside criterion output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@beinan beinan force-pushed the beinan/intern-version-meta branch from 2ebfa69 to 8304946 Compare April 14, 2026 00:54
@jackye1995 jackye1995 merged commit 23526f6 into lance-format:main Apr 14, 2026
53 of 55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants