Skip to content

New API & transaction type for user to set new base paths #4864

@jackye1995

Description

@jackye1995

We have a new multi-bucket features that would benefit from the multiple base paths feature we added in shallow clone: #4765

The proposed API is something like:

...

dataset.add_base("s3://bucket/abc", name="primary")
dataset.add_base("s3://bucket3/abc", name="us-west-2")
dataset.add_base("s3://bucket3/abc", name="eu-west-2")

...

lance.write_dataset(data, dataset, mode="append", base_paths=["primary", "us-west-2"])

Or adding that as a part of dataset creation:

lance.write_dataset(data, dataset, mode="create", base_paths=[
  { "path": "s3://bucket/abc", "name": "primary" }, 
  { "path": "s3://bucket3/abc", "name": "us-west-2" }
])

There are 2 topics not closed yet:

  1. it seems like we should give a deterministic name to the table's root location since it is not recorded in the path but we still want to reference it in the base_paths. In the example above, the user does not want to write data to the root location but only the newly added bases. But there should be also cases that user want to write to it, and should be able to specify something like base_paths=["root", "primary", "us-west-2"]
  2. should we create new transaction operations like UpdateBasePaths? It seems like we can also just add it to the UpdateConfig operation, but that might be overloading it.

Curious what people think, @jaystarshot @majin1102 @wjones127

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions