Skip to content

fix: support system columns in dataset.take* operations#5722

Merged
jackye1995 merged 17 commits intolance-format:mainfrom
hamersaw:bug/support-system-columns-in-take-scan
Jan 28, 2026
Merged

fix: support system columns in dataset.take* operations#5722
jackye1995 merged 17 commits intolance-format:mainfrom
hamersaw:bug/support-system-columns-in-take-scan

Conversation

@hamersaw
Copy link
Copy Markdown
Contributor

@hamersaw hamersaw commented Jan 15, 2026

Previously, "take*" operations did not support _rowid, _rowoffset, _row_created_at_version, and _row_last_updated_at_version. In this PR we add support for all of these columns.

We preserve these system columns through the initial schema projection so that they can be used to populate the correct flags when building the ProjectionPlan and PhysicalProjection structs.

  • _rowid / _rowaddr: persisting these through to ProjectionPlan fields was enough to make them work
  • _rowoffset: required additionally (1) stripping ROW_OFFSET field from ProjectionPlan requested_output_expr and (2) manually injecting column using AddRowOffsetExec (after exposing some methods publicly)
  • _row_created_at_version / _row_last_updated_at_version: required piping through flags to Fragment readers.

Closes #5615.

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@hamersaw hamersaw changed the title bug: support system columns in take_scan fix: support system columns in take_scan Jan 15, 2026
@github-actions github-actions Bot added the bug Something isn't working label Jan 15, 2026
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 15, 2026

Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hamersaw for working on this! Only have a question.

Comment thread rust/lance-core/src/datatypes/schema.rs
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw marked this pull request as draft January 19, 2026 15:34
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw changed the title fix: support system columns in take_scan fix: support system columns in dataset::take* operations Jan 20, 2026
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw changed the title fix: support system columns in dataset::take* operations fix: support system columns in dataset.take* operations Jan 20, 2026
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Comment thread rust/lance/src/dataset/take.rs Outdated
@hamersaw hamersaw marked this pull request as ready for review January 20, 2026 13:43
Comment thread rust/lance/src/dataset/take.rs
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw requested review from Xuanwo and jackye1995 January 21, 2026 21:30
@jackye1995
Copy link
Copy Markdown
Contributor

looks like there are quite a few CI failures, could you fix those?

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good but can you add a few tests? Preferably python tests.

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw requested a review from westonpace January 25, 2026 03:47
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. Thanks for doing this. I'll merge when green.

@westonpace
Copy link
Copy Markdown
Member

I've added #5823 which describes some of the stuff we talked about externally.

@westonpace
Copy link
Copy Markdown
Member

Looks like some legitimate test failures in the new python tests

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jackye1995 jackye1995 merged commit 37f64ea into lance-format:main Jan 28, 2026
29 checks passed
vivek-bharathan pushed a commit to vivek-bharathan/lance that referenced this pull request Feb 2, 2026
…#5722)

Previously, "take*" operations did not support `_rowid`, `_rowoffset`,
`_row_created_at_version`, and `_row_last_updated_at_version`. In this
PR we add support for all of these columns.

We preserve these system columns through the initial schema projection
so that they can be used to populate the correct flags when building the
`ProjectionPlan` and `PhysicalProjection` structs.
- `_rowid` / `_rowaddr`: persisting these through to `ProjectionPlan`
fields was enough to make them work
- `_rowoffset`: required additionally (1) stripping `ROW_OFFSET` field
from `ProjectionPlan` `requested_output_expr` and (2) manually injecting
column using `AddRowOffsetExec` (after exposing some methods publicly)
- `_row_created_at_version` / `_row_last_updated_at_version`: required
piping through flags to `Fragment` readers.

Closes lance-format#5615.

---------

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
@hamersaw hamersaw deleted the bug/support-system-columns-in-take-scan branch February 4, 2026 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: take_scan can take "_rowid" as meta column

4 participants