ci(hf): isolate CI repos and stabilize binding tests#7368
Conversation
HF dataset tests were failing with 412 errors because multiple CI jobs (core, Go, Node.js, Java) committed to the same shared git-based repo simultaneously. Each job now creates its own temporary HF repo/bucket and deletes it after the job via a node20 action post-run hook.
|
Seems like within single jobs there are still commits races. @Xuanwo could you try making condition not match retryable? |
Thanks! And updated. |
|
@Xuanwo this retry mechanism also tries to upload the xet blobs again raising an |
|
Could you try to cherry pick fa165d5 ? |
I believe the root cause here is our writer's |
|
I can try to make it re-enter safe e.g. using a state enum. Let my try that. |
Thank you for that! But I will cover this 🤟 |
|
Only one test failure now, working on it |
|
Cool, let's go! |
Which issue does this PR close?
Closes #.
Related: #7367
Rationale for this change
HF behavior tests are currently unstable in CI for two different reasons. The Java binding still crashes on Linux in the HF blocking path, and the shared HF CI repositories introduce cross-job interference for writable test cases.
What changes are included in this PR?
This PR switches HF CI jobs to create and clean up temporary repos or buckets per job so writable behavior tests do not race on the same remote state. It also keeps the Java HF behavior case disabled while #7367 is still open, and restores the missing
create_dircapability gating in the Go and Node.js HF bucket list suites.Are there any user-facing changes?
No.
AI Usage Statement
Used GPT-5 for CI log inspection, patch preparation, and PR description drafting.