Description
The Python bindings (pyiceberg-core) do not support GCS-backed Iceberg tables when used with the DataFusion table provider. The underlying iceberg-storage-opendal crate already has full GCS support via the opendal-gcs feature flag, and the OpenDalStorageFactory::Gcs variant exists in crates/storage/opendal/src/lib.rs — but it is not wired up in the Python bindings.
This means anyone using pyiceberg + DataFusion with tables stored on gs:// hits a runtime error:
RuntimeError: Unsupported storage scheme: gs
Steps to Reproduce
from pyiceberg.catalog import load_catalog
from datafusion import SessionContext
catalog = load_catalog("my_catalog") # REST catalog pointing to GCS-backed warehouse
table = catalog.load_table("my_namespace.my_table")
ctx = SessionContext()
ctx.register_table("my_table", table) # <-- fails here
Error:
RuntimeError: Unsupported storage scheme: gs
The call path is:
ctx.register_table() calls table.__datafusion_table_provider__()
- pyiceberg constructs
IcebergDataFusionTable and calls its __datafusion_table_provider__()
- Rust-side
storage_factory_from_path() in bindings/python/src/datafusion_table_provider.rs does not match gs or gcs schemes
Proposed Changes
1. Enable opendal-gcs feature in bindings/python/Cargo.toml
- iceberg-storage-opendal = { path = "../../crates/storage/opendal", features = ["opendal-s3", "opendal-fs", "opendal-memory"] }
+ iceberg-storage-opendal = { path = "../../crates/storage/opendal", features = ["opendal-s3", "opendal-fs", "opendal-memory", "opendal-gcs"] }
2. Add gs/gcs match arms in bindings/python/src/datafusion_table_provider.rs
In storage_factory_from_path():
let factory: Arc<dyn StorageFactory> = match scheme {
"file" | "" => Arc::new(OpenDalStorageFactory::Fs),
"s3" | "s3a" => Arc::new(OpenDalStorageFactory::S3 {
configured_scheme: scheme.to_string(),
customized_credential_load: None,
}),
"memory" => Arc::new(OpenDalStorageFactory::Memory),
+ "gs" | "gcs" => Arc::new(OpenDalStorageFactory::Gcs),
_ => {
return Err(PyRuntimeError::new_err(format!(
"Unsupported storage scheme: {scheme}"
)));
}
};
Context
Description
The Python bindings (
pyiceberg-core) do not support GCS-backed Iceberg tables when used with the DataFusion table provider. The underlyingiceberg-storage-opendalcrate already has full GCS support via theopendal-gcsfeature flag, and theOpenDalStorageFactory::Gcsvariant exists incrates/storage/opendal/src/lib.rs— but it is not wired up in the Python bindings.This means anyone using pyiceberg + DataFusion with tables stored on
gs://hits a runtime error:Steps to Reproduce
Error:
The call path is:
ctx.register_table()callstable.__datafusion_table_provider__()IcebergDataFusionTableand calls its__datafusion_table_provider__()storage_factory_from_path()inbindings/python/src/datafusion_table_provider.rsdoes not matchgsorgcsschemesProposed Changes
1. Enable
opendal-gcsfeature inbindings/python/Cargo.toml2. Add
gs/gcsmatch arms inbindings/python/src/datafusion_table_provider.rsIn
storage_factory_from_path():let factory: Arc<dyn StorageFactory> = match scheme { "file" | "" => Arc::new(OpenDalStorageFactory::Fs), "s3" | "s3a" => Arc::new(OpenDalStorageFactory::S3 { configured_scheme: scheme.to_string(), customized_credential_load: None, }), "memory" => Arc::new(OpenDalStorageFactory::Memory), + "gs" | "gcs" => Arc::new(OpenDalStorageFactory::Gcs), _ => { return Err(PyRuntimeError::new_err(format!( "Unsupported storage scheme: {scheme}" ))); } };Context
OpenDalStorageFactory::Gcsvariant andgcs_config_parse()already exist incrates/storage/opendal/src/lib.rsopendal-gcsfeature flag is defined incrates/storage/opendal/Cargo.tomland is included inopendal-allgs:///gcs://scheme support in fix: support both gs and gcs schemes for google cloud storage #845 and OAuth support in feat: add gcp oauth support #654