Skip to content

refactor: replace custom XET sink with OpenDAL ObjectStore#2

Closed
davanstrien wants to merge 95 commits intofeature/hf-bucket-sinkfrom
refactor/opendal-hf-sink
Closed

refactor: replace custom XET sink with OpenDAL ObjectStore#2
davanstrien wants to merge 95 commits intofeature/hf-bucket-sinkfrom
refactor/opendal-hf-sink

Conversation

@davanstrien
Copy link
Copy Markdown
Owner

Summary

  • Replace ~1,050 lines of custom XET sink code with a standard ObjectStore implementation backed by OpenDAL's HF service
  • HF URLs now flow through the same FileSink path as S3/GCS/Azure — no custom sink node, no IR special-case
  • All new logic consolidated in a single crates/polars-io/src/cloud/hf.rs module (~175 lines), following the same build_hf() pattern as build_aws()/build_gcp()/build_azure()
  • Rename feature flag hf_bucket_sinkhf

Motivation

Polars maintainers flagged that the custom XET code was too much HF-specific maintenance burden. The OpenDAL approach offloads XET/HF internals to OpenDAL, leaving Polars with only standard ObjectStore wiring.

What's deleted

  • crates/polars-io/src/cloud/hf_bucket/ (4 files, 721 lines) — custom XET upload, batch API, streaming uploader
  • crates/polars-stream/src/nodes/io_sinks/hf_bucket_sink.rs (260 lines) — custom sink node
  • IR lowering special-case, PhysNodeKind::HfBucketSink variant, graph/fmt wiring

What's added

  • crates/polars-io/src/cloud/hf.rs — URL parsing, token extraction, OpenDAL builder
  • object_store_setup.rs — 6-line CloudType::Hf match arm calling build_hf()
  • object_store_opendal compatibility bump from object_store 0.12 → 0.13

Dependencies

Using local path deps for opendal and object_store_opendal during development. Will swap to published crate versions once apache/opendal#7185 ships.

Test plan

  • cargo check -p polars-stream --features hf — compiles
  • cargo check -p polars-stream — no regression without feature
  • End-to-end test with real HF bucket (pending OpenDAL release)

🤖 Generated with Claude Code

Kevin-Patyk and others added 30 commits March 13, 2026 21:35
…ola-rs#26938)

Co-authored-by: gabriel <gabriel.g.robin@airbus.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: moktamd <moktamd@users.noreply.github.com>
Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
…6764)

Co-authored-by: Simon Lin <simonlin.rqmmw@slmail.me>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
MarcoGorelli and others added 22 commits March 30, 2026 14:48
…#27104)

Co-authored-by: Orson Peters <orsonpeters@gmail.com>
pola-rs#27087)

Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
Co-authored-by: Dani Pinyol <dani@avatarcognition.com>
…la-rs#27118)

Co-authored-by: Orson Peters <orsonpeters@gmail.com>
Resolve conflict in polars-stream/Cargo.toml: keep both hf_bucket_sink
feature (ours) and is_first_distinct feature (upstream).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the custom HfBucketSinkNode (1,050 lines of XET-specific code)
with a standard ObjectStore implementation backed by OpenDAL's HF service.

HF URLs now flow through the same FileSink path as S3/GCS/Azure,
requiring only a thin build_hf() builder in a new hf.rs module.

Key changes:
- Add crates/polars-io/src/cloud/hf.rs: HF URL parsing, token
  extraction, and OpenDAL ObjectStore construction (~175 lines)
- Wire CloudType::Hf in object_store_setup.rs to call build_hf(),
  matching the pattern used by build_aws/build_gcp/build_azure
- Delete custom sink: hf_bucket/ directory (4 files, 721 lines),
  HfBucketSinkNode (260 lines), IR lowering special-case,
  PhysNodeKind::HfBucketSink variant
- Rename feature flag hf_bucket_sink -> hf across 9 Cargo.toml files
- Bump object_store_opendal compatibility from object_store 0.12 to 0.13

Dependencies: opendal + object_store_opendal (local path deps for now,
will switch to published crate versions once apache/opendal#7185 ships).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davanstrien
Copy link
Copy Markdown
Owner Author

Replaced by a clean branch — see new PR.

@davanstrien davanstrien closed this Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.