Skip to content

feat: add HF Storage Bucket support via OpenDAL#3

Open
davanstrien wants to merge 97 commits intofeature/hf-bucket-sinkfrom
hf-opendal-sink
Open

feat: add HF Storage Bucket support via OpenDAL#3
davanstrien wants to merge 97 commits intofeature/hf-bucket-sinkfrom
hf-opendal-sink

Conversation

@davanstrien
Copy link
Copy Markdown
Owner

Summary

Add hf:// URL support for Polars cloud writes via OpenDAL, enabling:

df.lazy().sink_parquet("hf://buckets/org/name/data.parquet")

Approach

Follows the same pattern as existing cloud backends (S3/GCS/Azure):

  • crates/polars-io/src/cloud/hf.rs — URL parsing, token resolution, OpenDAL ObjectStore construction
  • object_store_setup.rs — 6-line CloudType::Hf match arm calling build_hf()
  • Feature flag hf propagated through workspace Cargo.toml chain

HF URLs flow through the standard FileSink — no custom sink node, no IR special-casing.

Dependencies

Using local path deps for opendal and object_store_opendal during development. Will swap to published crate versions once apache/opendal#7185 ships.

Test plan

  • cargo check -p polars-stream --features hf compiles
  • cargo check -p polars-stream no regression without feature
  • End-to-end test with real HF bucket (pending OpenDAL release)

🤖 Generated with Claude Code

Kevin-Patyk and others added 30 commits March 13, 2026 21:35
…ola-rs#26938)

Co-authored-by: gabriel <gabriel.g.robin@airbus.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: moktamd <moktamd@users.noreply.github.com>
Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
…6764)

Co-authored-by: Simon Lin <simonlin.rqmmw@slmail.me>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
alexander-beedie and others added 23 commits March 30, 2026 14:05
…#27104)

Co-authored-by: Orson Peters <orsonpeters@gmail.com>
pola-rs#27087)

Co-authored-by: nameexhaustion <simonlin.rqmmw@slmail.me>
Co-authored-by: Dani Pinyol <dani@avatarcognition.com>
…la-rs#27118)

Co-authored-by: Orson Peters <orsonpeters@gmail.com>
Add an ObjectStore implementation for `hf://` URLs backed by OpenDAL's
HF service, enabling `sink_parquet("hf://buckets/org/name/file.parquet")`
to stream directly to Hugging Face Storage Buckets.

The implementation follows the same pattern as existing cloud backends
(S3/GCS/Azure): a `build_hf()` function in a new `hf.rs` module
constructs the ObjectStore, and `object_store_setup.rs` calls it from
the `CloudType::Hf` match arm. HF URLs flow through the standard
FileSink path with no custom sink node or IR special-casing.

New files:
- crates/polars-io/src/cloud/hf.rs — URL parsing, token resolution,
  OpenDAL ObjectStore construction

Feature flag: `hf` (opt-in, propagated through the workspace)

Dependencies: opendal + object_store_opendal (local path deps for now,
will switch to published crate versions once apache/opendal#7185 ships)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
davanstrien and others added 3 commits April 7, 2026 07:23
Point opendal and object_store_opendal at kszucs/opendal@4c70bd8
(hf-revamp branch) which uses published hf-xet 1.5.0 from crates.io.

Once apache/opendal#7185 merges and publishes, these become simple
version deps (e.g. opendal = "0.56").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Manual dispatch workflow that builds Linux x64 and ARM64 wheels
with the hf feature enabled via maturin.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
apache/opendal#7185 merged today — HF backend XET support is now on
upstream main. Pin to commit 8d3dbcc3ef until a release ships.

Next: once opendal publishes a release with services-hf, swap these
git deps for a version number (opendal = "0.56" or whatever ships).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.