Skip to content

IvfModel.save/load and PqModel.save/load do not support storage_options #6311

@hushengquan

Description

@hushengquan

Problem

IvfModel.save() / IvfModel.load() and PqModel.save() / PqModel.load() accept a uri parameter that can point to cloud storage (e.g. s3://, gs://), but they do not accept storage_options. The underlying LanceFileReader and LanceFileWriter both already support a storage_options parameter for passing credentials and other backend-specific options, but the model save/load methods never forward it.

This means users currently can only authenticate via environment variables (AWS_ACCESS_KEY_ID, etc.) or a pre-configured credentials file (~/.aws/credentials). This becomes a problem when:

  • Working with multiple object storage instances that require different credentials (e.g. reading centroids from one S3 bucket and writing the codebook to another).
  • Running in environments where setting environment variables is not desirable or possible.
  • Needing to pass endpoint overrides for S3-compatible storage (e.g. MinIO).

Expected Behavior

IvfModel.save/load and PqModel.save/load should accept an optional storage_options dict and pass it through to LanceFileWriter / LanceFileReader, consistent with the rest of the Lance Python API (e.g. lance.dataset(), lance.write_dataset()).

from lance.indices.ivf import IvfModel
from lance.indices.pq import PqModel

storage_options = {
    "aws_access_key_id": "AKIA...",
    "aws_secret_access_key": "...",
    "region": "us-east-1",
}

ivf = IvfModel.load("s3://bucket/ivf.lance", storage_options=storage_options)
ivf.save("s3://another-bucket/ivf.lance", storage_options=other_storage_options)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions