refactor: replace custom XET sink with OpenDAL ObjectStore#2
Closed
davanstrien wants to merge 95 commits intofeature/hf-bucket-sinkfrom
Closed
refactor: replace custom XET sink with OpenDAL ObjectStore#2davanstrien wants to merge 95 commits intofeature/hf-bucket-sinkfrom
davanstrien wants to merge 95 commits intofeature/hf-bucket-sinkfrom
Conversation
…_slice_unchecked` (pola-rs#26928)
…ola-rs#26938) Co-authored-by: gabriel <gabriel.g.robin@airbus.com>
Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: moktamd <moktamd@users.noreply.github.com> Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
…oes not begin with base path, or contains '..' (pola-rs#26894)
…6764) Co-authored-by: Simon Lin <simonlin.rqmmw@slmail.me>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
…#27104) Co-authored-by: Orson Peters <orsonpeters@gmail.com>
pola-rs#27087) Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
Co-authored-by: Dani Pinyol <dani@avatarcognition.com>
…la-rs#27118) Co-authored-by: Orson Peters <orsonpeters@gmail.com>
Resolve conflict in polars-stream/Cargo.toml: keep both hf_bucket_sink feature (ours) and is_first_distinct feature (upstream). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the custom HfBucketSinkNode (1,050 lines of XET-specific code) with a standard ObjectStore implementation backed by OpenDAL's HF service. HF URLs now flow through the same FileSink path as S3/GCS/Azure, requiring only a thin build_hf() builder in a new hf.rs module. Key changes: - Add crates/polars-io/src/cloud/hf.rs: HF URL parsing, token extraction, and OpenDAL ObjectStore construction (~175 lines) - Wire CloudType::Hf in object_store_setup.rs to call build_hf(), matching the pattern used by build_aws/build_gcp/build_azure - Delete custom sink: hf_bucket/ directory (4 files, 721 lines), HfBucketSinkNode (260 lines), IR lowering special-case, PhysNodeKind::HfBucketSink variant - Rename feature flag hf_bucket_sink -> hf across 9 Cargo.toml files - Bump object_store_opendal compatibility from object_store 0.12 to 0.13 Dependencies: opendal + object_store_opendal (local path deps for now, will switch to published crate versions once apache/opendal#7185 ships). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Owner
Author
|
Replaced by a clean branch — see new PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
crates/polars-io/src/cloud/hf.rsmodule (~175 lines), following the samebuild_hf()pattern asbuild_aws()/build_gcp()/build_azure()hf_bucket_sink→hfMotivation
Polars maintainers flagged that the custom XET code was too much HF-specific maintenance burden. The OpenDAL approach offloads XET/HF internals to OpenDAL, leaving Polars with only standard ObjectStore wiring.
What's deleted
crates/polars-io/src/cloud/hf_bucket/(4 files, 721 lines) — custom XET upload, batch API, streaming uploadercrates/polars-stream/src/nodes/io_sinks/hf_bucket_sink.rs(260 lines) — custom sink nodePhysNodeKind::HfBucketSinkvariant, graph/fmt wiringWhat's added
crates/polars-io/src/cloud/hf.rs— URL parsing, token extraction, OpenDAL builderobject_store_setup.rs— 6-lineCloudType::Hfmatch arm callingbuild_hf()object_store_opendalcompatibility bump from object_store 0.12 → 0.13Dependencies
Using local path deps for
opendalandobject_store_opendalduring development. Will swap to published crate versions once apache/opendal#7185 ships.Test plan
cargo check -p polars-stream --features hf— compilescargo check -p polars-stream— no regression without feature🤖 Generated with Claude Code