feat: provide inline_transaction model for IO optimizing by majin1102 · Pull Request #4774 · lance-format/lance

majin1102 · 2025-09-19T11:43:13Z

Hi, @wjones127, @jackye1995 , I'm working on issue #4308 #3487

In this PR I wanna proposing the inline_transaction model for IO optimizing. The motivation includes:

Get transaction summary without transaction file IO
Optimize commit by reducing transaction file IO

codecov-commenter · 2025-09-19T12:22:08Z

Codecov Report

❌ Patch coverage is 80.81633% with 47 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.84%. Comparing base (1a1d1e2) to head (8956437).

Files with missing lines	Patch %	Lines
rust/lance/src/dataset.rs	86.14%	15 Missing and 8 partials ⚠️
rust/lance/src/io/commit.rs	38.46%	6 Missing and 2 partials ⚠️
rust/lance-table/src/format/transaction.rs	33.33%	6 Missing ⚠️
rust/lance-table/src/io/commit.rs	72.22%	0 Missing and 5 partials ⚠️
rust/lance-table/src/feature_flags.rs	75.00%	1 Missing and 2 partials ⚠️
rust/lance-table/src/io/manifest.rs	88.88%	0 Missing and 1 partial ⚠️
rust/lance/src/dataset/transaction.rs	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4774      +/-   ##
==========================================
+ Coverage   81.75%   81.84%   +0.08%     
==========================================
  Files         340      341       +1     
  Lines      140010   140204     +194     
  Branches   140010   140204     +194     
==========================================
+ Hits       114469   114752     +283     
+ Misses      21732    21626     -106     
- Partials     3809     3826      +17

Flag	Coverage Δ
unittests	`81.84% <80.81%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jackye1995 · 2025-09-19T15:54:36Z

+    // if < 200KB, store the transaction content inline
+    bytes transaction_content = 19;
+    // if >= 200KB, store the transaction content at the specified offset
+    TransactionSection transaction_section = 20;


I was thinking we should just keep it as a separated file which seems to be more backwards compatible. Is there any other benefit in storing it at offset of the manifest?

For backwards compatibility we should write both for a while. We can stop writing the external file after adding a feature flag 6+ months in the future. I've described the sequence of steps in #3487

wjones127

I think putting inline in the protobuf is necessary. We can put it at an offset and use existing optimizations to read it without an additional IOP.

wjones127 · 2025-09-19T17:06:21Z

+  // The transaction that created this version.
+  oneof inline_transaction {
+    // if < 200KB, store the transaction content inline
+    bytes transaction_content = 19;
+    // if >= 200KB, store the transaction content at the specified offset
+    TransactionSection transaction_section = 20;
+  }


We should just unconditionally store the transaction at an offset, IMO. This makes this simpler, and it's still possible to read the transaction when it is small.

We currently store the index metadata at an offset, and here's how it works right now:

When we read the manifest file, we always read the first block_size bytes (4kb for local fs, 64kb for object storage)

We decode as much as we can from that last chunk. If the entire manifest file is < 64kb, this often means we get the entire content (including the index metadata) in 1 IOP.

If that last chunk wasn't sufficient, we read the remainder.

This basically gives the same effect as the optionally inline message: If it's small, we automatically read the index and transaction information in the first IO request. But if it's large, we do it in a separate request.

Here's the place where we opportunistically read the index metadata from manifests:

https://github.com/lancedb/lance/blob/a3859653e29fc73399de49187956817c40f83a61/rust/lance/src/dataset.rs#L511-L536

wjones127 · 2025-09-19T17:12:19Z

+    // if < 200KB, store the transaction content inline
+    bytes transaction_content = 19;
+    // if >= 200KB, store the transaction content at the specified offset
+    TransactionSection transaction_section = 20;


For backwards compatibility we should write both for a while. We can stop writing the external file after adding a feature flag 6+ months in the future. I've described the sequence of steps in #3487

majin1102 · 2025-09-21T17:07:54Z

+  message TransactionSection {
+    // The summary of the transaction including operation type and some kvs in transaction_properties
+    // This is used to get some information about the transaction without loading the whole transaction content
+    map<string, string> summary = 1;


Hi, @wjones127 , What do you think of this part.

Basically it's redundent content for operation and transaction_properties. The case I want to solve now may encount listing historical manifest and transaction summaries. I'm not sure the opportunistically read could actually help this listing case if the manifest is large(actually the larger manifest we might need to accelerate it more).

On the other hand, some information may be useful if we could directly read from manifest like commit message. So I added this summary thing for directly reading from manifest without reading transaction part.
cc @jackye1995

For the list transactions use case, I can’t help like feeling the best thing would be to cache compacted summary data in some secondary file. Like we could generate some Lance file next to the manifest that contains all transaction summaries before it. Then to get most of the history you just need to query that. Then read the transactions of the next few uncached versions. What do you think of that?

How much performance do we want for this use case? I always thought reading information like transaction summary is more for non performance critical workloads, like displaying info to the user, triggerring some optimization jobs based on the action performed, etc. that can stand a few more IOPS.

Like we could generate some Lance file next to the manifest that contains all transaction summaries before it

I had the same ideas. I thought lance might provide a lazy summary file includes tags and branches, summaries and properties. And could use the index framework to maintain this file. This was only an early idea. I could raise a discussion for it. Then we should ignore the summary here.

How much performance do we want for this use case? I always thought reading information like transaction summary is more for non performance critical workloads, like displaying info to the user, triggerring some optimization jobs based on the action performed, etc. that can stand a few more IOPS

For my case, one page displays at least 20 versions. each version should get a summary info includes manifest and transaction. If files seperated, the IO costs might have a magnification of 20x. Let's say read a transaction file costs 100ms，that means 2 seconds magnification at least.

Yeah, I'm inclined to not have the summary there. Just keep it in the transaction. Pulling it out into the manifest means that listing is still O(num versions) IO requests. To make that fast, better to create some mechanism to query it.

non performance critical workloads, like displaying info to the user

I'd argue that displaying info to the user is somewhat performance critical, in that we'd like it to return fast enough to feel responsive.

If files seperated, the IO costs might have a magnification of 20x. Let's say read a transaction file costs 100ms，that means 2 seconds magnification at least.
I'd argue that displaying info to the user is somewhat performance critical, in that we'd like it to return fast enough to feel responsive.

I was thinking we will just make the requests in parallel, so the latency won't really be 2 seconds, more like 200-300ms if anything got throttled and retried. But agree we need to make sure it is fast enough to feel responsive.

I thought lance might provide a lazy summary file includes tags and branches, summaries and properties. And could use the index framework to maintain this file.

that sounds like a nice idea, I originally created the concept of "system index" basically for such use cases.

majin1102 · 2025-09-30T11:54:07Z

Ready for review @wjones127 cc @jackye1995

wjones127

Nice work. Excited to have this in soon!

I'd like to see a test on IO. And I think disabling writing separate transaction files should enable a writer feature flag, which will alert older libraries of the incompatibility.

wjones127 · 2025-10-02T15:41:15Z

+  // The file position of the transaction content.
+  // None if transaction is empty
+  optional uint64 transaction_section = 21;


Could you describe here how to calculate where the message ends?

wjones127 · 2025-10-02T15:45:05Z

+    /// Control whether to also write an external transaction file.
+    /// Default is true for backward compatibility. Set to false to write inline only.
+    pub fn with_write_transaction_file(mut self, write_transaction_file: bool) -> Self {
+        self.write_transaction_file = write_transaction_file;
+        self
+    }


I don't think this should be a runtime setting. We should add a writer feature flag for only writing inline transaction files, make it off by default. Users can then enable it on table creation or with a later update. That will enforce older libraries don't try to write to this.

Users can then enable it on table creation or with a later update

I guess that means directly deleting the code of write_transaction_file. And for the six months as you said, we kept the write_transaction_file and don't control it by any setting. And we update the feature flags by hardcode in some version in the future. Do I understand right?

The reason I ask is feature flags applying is after write_transaction_file, besides, we don't have a initialized feature flag to read in commit_new_dataset. I think if we want to dynamically control the behivour we still need some kind of setting. But I think this may be over thinking. Hardcode seems fine.

And we update the feature flags by hardcode in some version in the future. Do I understand right?

Sorry, I think I wasn't clear. We should still have a setting.

The way you had it written before, users would choose on each commit whether to write the transaction file. I proposed instead they should decide that only once, when they create the dataset. Or they could decide later when they do a one-time migration of an existing dataset.

See how we do it for stable row id or v2 manifest paths, in WriteParams:

https://github.com/lancedb/lance/blob/ec2ac35b9a0bd644e222b09a2f19fe899e9b1ed8/rust/lance/src/dataset/write.rs#L196-L207

majin1102 · 2025-10-03T18:58:00Z

Nice work. Excited to have this in soon!

I'd like to see a test on IO. And I think disabling writing separate transaction files should enable a writer feature flag, which will alert older libraries of the incompatibility.

Thanks for the guide and review so far. I'm currently on a public holiday. I will adress your comments as soon as possible!

majin1102 · 2025-10-04T12:52:29Z

Ready for another look @wjones127

wjones127

Generally looks good. Do you think we should have a setting? It would be nice to be able to enable it?

wjones127 · 2025-10-07T15:49:07Z

+    /// Construct a format-layer Transaction from a protobuf Transaction.
+    pub fn from_pb(pb_tx: pb::Transaction) -> Self {
+        Self { inner: pb_tx }
+    }


This probably should just be a From<pb::Transaction> impl

wjones127 · 2025-10-07T16:10:54Z

+    /// Control whether to also write an external transaction file.
+    /// Default is true for backward compatibility. Set to false to write inline only.
+    pub fn with_write_transaction_file(mut self, write_transaction_file: bool) -> Self {
+        self.write_transaction_file = write_transaction_file;
+        self
+    }


And we update the feature flags by hardcode in some version in the future. Do I understand right?

Sorry, I think I wasn't clear. We should still have a setting.

The way you had it written before, users would choose on each commit whether to write the transaction file. I proposed instead they should decide that only once, when they create the dataset. Or they could decide later when they do a one-time migration of an existing dataset.

See how we do it for stable row id or v2 manifest paths, in WriteParams:

https://github.com/lancedb/lance/blob/ec2ac35b9a0bd644e222b09a2f19fe899e9b1ed8/rust/lance/src/dataset/write.rs#L196-L207

majin1102 · 2025-10-08T07:54:22Z

Generally looks good. Do you think we should have a setting? It would be nice to be able to enable it?

Yeah, I did think we should have a setting.

See how we do it for stable row id or v2 manifest paths, in WriteParams:

I'm a little confused about this WriteParams. The parameters did't seem to store in any places. The only way we could use is to pass from the write interface. How does this implement the fact that

instead they should decide that only once, when they create the dataset. Or they could decide later when they do a one-time migration of an existing dataset

I'm not sure if you are saying just modify the default value of WriteParams to migrate this setting(If so I think CommitConfig might be suitable)? @wjones127

wjones127 · 2025-10-09T16:31:25Z

@majin1102 Did you look at how stable row ids work? I think that's a good guide here.

The user turns them on via WriteParams:

https://github.com/lancedb/lance/blob/8027b5c9fece8dd078b842e6d70cf8f75184855a/rust/lance/src/dataset/write.rs#L196-L200

This is passed down with ManifestWriteConfig:

https://github.com/lancedb/lance/blob/8027b5c9fece8dd078b842e6d70cf8f75184855a/rust/lance/src/dataset.rs#L2428-L2435

And then it is saved in the feature flags:

https://github.com/lancedb/lance/blob/8027b5c9fece8dd078b842e6d70cf8f75184855a/rust/lance-table/src/feature_flags.rs#L27

When we load the dataset, we call Manifest::uses_stable_row_ids(), which is just implemented as a check on the feature flag:

https://github.com/lancedb/lance/blob/8027b5c9fece8dd078b842e6d70cf8f75184855a/rust/lance-table/src/format/manifest.rs#L515-L518

majin1102 · 2025-10-10T04:17:16Z

@majin1102 Did you look at how stable row ids work? I think that's a good guide here.

The user turns them on via WriteParams:

https://github.com/lancedb/lance/blob/8027b5c9fece8dd078b842e6d70cf8f75184855a/rust/lance/src/dataset/write.rs#L196-L200

This is passed down with ManifestWriteConfig:

https://github.com/lancedb/lance/blob/8027b5c9fece8dd078b842e6d70cf8f75184855a/rust/lance/src/dataset.rs#L2428-L2435

And then it is saved in the feature flags:

https://github.com/lancedb/lance/blob/8027b5c9fece8dd078b842e6d70cf8f75184855a/rust/lance-table/src/feature_flags.rs#L27

When we load the dataset, we call Manifest::uses_stable_row_ids(), which is just implemented as a check on the feature flag:

https://github.com/lancedb/lance/blob/8027b5c9fece8dd078b842e6d70cf8f75184855a/rust/lance-table/src/format/manifest.rs#L515-L518

Thank you so much for the details!

I was using open call in IDEA. Next time I will look more patiently and carefully(maybe with some simple tests)

wjones127

This is looking great! Nice work

wjones127 · 2025-10-30T16:45:24Z

@majin1102 did you still want to finish this? It seemed like it was almost done and then you moved it into draft.

majin1102 · 2025-10-31T07:54:08Z

@majin1102 did you still want to finish this? It seemed like it was almost done and then you moved it into draft.

Sorry for delaying so far. I didn't mean to leave this unfinished. I was encountering an assertion failure on windows after some code rebasing(assert IO times). I had some clue and tried to fix, but didn't work. I was intending to use some more time to look through this. But yeah there's some other parallel work and eventually led to the PR being consistently delayed.

In the meanwhile I found load_new_transactions() was still using transaction files:
https://github.com/lancedb/lance/blob/2f95f341606220292343dfaa0fa64f5453357067/rust/lance/src/dataset.rs#L2078-L2120

I missed this part. I will complete this asap . Thanks again for the review and guide @wjones127

majin1102 · 2025-10-31T09:26:06Z

I was encountering an assert failure on windows after some code rebasing(assert IO times). I had some clue and tried to fix, but didn't work

After the latest rebase, all Windows tests pass. Looks like a recent commit fixed that windows issue. Feel free to merge if it looks good to you. Please let me know if you have any other comments!

chatgpt-codex-connector

💡 Codex Review

https://github.com/lancedb/lance/blob/18625cef89474db2baccb766e0e972d713ceaca7/rust/lance/src/dataset.rs#L2135-L2159
Transactions listing ignores inline transaction data

The new inline transaction support only updates Dataset::read_transaction; the transactions() stream still unconditionally loads _transactions/{file} and errors if no external file was written. When disable_transaction_file is enabled (or a transaction file is deleted as in the new tests), manifest.transaction_file remains an empty string, so this code attempts read_transaction_file and fails instead of using the inline transaction_section embedded in the manifest. As a result, datasets relying solely on inline transactions cannot enumerate history even though the manifest contains the data. The method should reuse Dataset::read_transaction or fallback to manifest.transaction_section before assuming a file exists.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@wjones127

…t#4774) Hi, @wjones127, @jackye1995 , I'm working on issue lance-format#4308 lance-format#3487 In this PR I wanna proposing the inline_transaction model for IO optimizing. The motivation includes: 1. Get transaction summary without transaction file IO 2. Optimize commit by reducing transaction file IO

majin1102 marked this pull request as draft September 19, 2025 11:43

github-actions Bot added the enhancement New feature or request label Sep 19, 2025

wjones127 self-assigned this Sep 19, 2025

jackye1995 reviewed Sep 19, 2025

View reviewed changes

wjones127 reviewed Sep 19, 2025

View reviewed changes

majin1102 commented Sep 21, 2025

View reviewed changes

majin1102 force-pushed the inline_transaction_model branch 2 times, most recently from 9cd17b7 to 06e1a43 Compare September 29, 2025 14:40

majin1102 marked this pull request as ready for review September 29, 2025 15:08

majin1102 force-pushed the inline_transaction_model branch from a93622b to e61260c Compare September 29, 2025 15:09

This comment was marked as outdated.

Sign in to view

majin1102 marked this pull request as draft September 29, 2025 16:04

github-actions Bot added the java label Sep 30, 2025

majin1102 marked this pull request as ready for review September 30, 2025 11:53

wjones127 requested changes Oct 2, 2025

View reviewed changes

wjones127 reviewed Oct 7, 2025

View reviewed changes

wjones127 approved these changes Oct 14, 2025

View reviewed changes

majin1102 force-pushed the inline_transaction_model branch from 5de4f82 to ba58719 Compare October 14, 2025 17:23

majin1102 marked this pull request as draft October 14, 2025 17:24

majin1102 force-pushed the inline_transaction_model branch from b797357 to 4ce1cca Compare October 31, 2025 07:42

feat: support inline transaction

18625ce

majin1102 force-pushed the inline_transaction_model branch from 4ce1cca to 18625ce Compare October 31, 2025 08:32

majin1102 marked this pull request as ready for review October 31, 2025 09:26

chatgpt-codex-connector Bot reviewed Oct 31, 2025

View reviewed changes

load_new_transactions() adapt inline transaction

c419c4e

majin1102 force-pushed the inline_transaction_model branch from 3ad71b3 to c419c4e Compare October 31, 2025 14:21

Merge branch 'main' into inline_transaction_model

8956437

wjones127 mentioned this pull request Nov 7, 2025

docs: repare the transaction conflicts resolution flow #5133

Closed

wjones127 merged commit 46a249a into lance-format:main Nov 10, 2025
27 checks passed

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

Conversation

majin1102 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jackye1995 Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

majin1102 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

majin1102 Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

majin1102 commented Oct 3, 2025

Uh oh!

majin1102 commented Oct 4, 2025

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

majin1102 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wjones127 commented Oct 9, 2025

Uh oh!

majin1102 commented Oct 10, 2025

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

wjones127 commented Oct 30, 2025

Uh oh!

majin1102 commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

majin1102 commented Sep 19, 2025 •

edited

Loading

codecov-commenter commented Sep 19, 2025 •

edited

Loading

jackye1995 Sep 21, 2025 •

edited

Loading

majin1102 commented Sep 30, 2025 •

edited

Loading

majin1102 Oct 4, 2025 •

edited

Loading

majin1102 commented Oct 8, 2025 •

edited

Loading

majin1102 commented Oct 31, 2025 •

edited

Loading

majin1102 commented Oct 31, 2025 •

edited

Loading