Skip to content

feat: serialize storage options in table identifier proto#5973

Merged
LuQQiu merged 4 commits intolance-format:mainfrom
LuQQiu:lu/fixCodex
Feb 21, 2026
Merged

feat: serialize storage options in table identifier proto#5973
LuQQiu merged 4 commits intolance-format:mainfrom
LuQQiu:lu/fixCodex

Conversation

@LuQQiu
Copy link
Copy Markdown
Contributor

@LuQQiu LuQQiu commented Feb 21, 2026

add storage options in table identifier proto to allow pass in storage credentials or information.
Make dataset parameter optional in filtered_read_exec_from_proto, falls back to opening from the proto's table identifier

@LuQQiu LuQQiu requested a review from westonpace February 21, 2026 01:22
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@github-actions github-actions Bot added the enhancement New feature or request label Feb 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Code Review

P0 - Security Concern: Credentials in Serialized Proto

The addition of storage_options to TableIdentifier proto introduces a security concern. According to the codebase, storage_options can contain sensitive credentials:

  • AWS access keys and secret keys
  • Security tokens (OSS, Tencent Cloud, HuggingFace)
  • API keys and authentication tokens
  • Credentials from secret managers

These are now being serialized into the proto message for distributed execution (planner → executor). This could expose credentials if:

  1. Proto messages are logged - common in distributed systems for debugging
  2. Network transport is not encrypted - credentials sent in plaintext
  3. Proto messages are cached/persisted - credentials stored insecurely

Suggestions:

  • Consider whether storage options should be passed out-of-band (e.g., via a secure credential provider on the executor side)
  • At minimum, add documentation warning that this proto should only be sent over encrypted channels
  • Consider filtering out sensitive keys before serialization, or use a dedicated credentials field with explicit handling

If this is intentional for specific distributed execution use cases with proper security controls, please document the expected security model.

@LuQQiu LuQQiu requested a review from jackye1995 February 21, 2026 01:24
@LuQQiu LuQQiu changed the title feat: FilteredReadExec serd with storage options feat: serialize storage options in table identifier proto Feb 21, 2026
@LuQQiu LuQQiu changed the title feat: serialize storage options in table identifier proto feat: serialize storage options in table identifier proto Feb 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 21, 2026

Codecov Report

❌ Patch coverage is 56.09756% with 18 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/io/exec/filtered_read_proto.rs 56.09% 15 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

manifest_etag: dataset.manifest_location.e_tag.clone(),
serialized_manifest: Some(manifest_proto.encode_to_vec()),
storage_options: dataset
.initial_storage_options()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should call latest_storage_options()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's good!

// Two modes:
// 1. uri + serialized_manifest (fast): remote executor skips manifest read.
// 2. uri + version + etag (lightweight): remote executor loads manifest from storage.
message TableIdentifier {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel at this point it's more like a table state than table identifier, not sure if we can still change ethe name though

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emmm, we mention not make the proto backward compatible, this may be merged within one week... emm

Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 comments, other parts looks good to me

@LuQQiu LuQQiu merged commit 540e973 into lance-format:main Feb 21, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants