Skip to content

stash/restore: add retry-count and fail-on-download inputs#716

Merged
potiuk merged 5 commits intomainfrom
stash-restore-retry-download
Apr 18, 2026
Merged

stash/restore: add retry-count and fail-on-download inputs#716
potiuk merged 5 commits intomainfrom
stash-restore-retry-download

Conversation

@potiuk
Copy link
Copy Markdown
Member

@potiuk potiuk commented Apr 14, 2026

Summary

  • Add retry-count input (default 3) to stash/restore/action.yml. When gh run download exits with code 1 (the transient failure mode observed for artifact downloads), the download is retried up to retry-count times. Other exit codes are not retried.
  • Add fail-on-download input (default false). When true, the step fails if the stash artifact was found but could not be downloaded after exhausting retries. When false, the failure is reported only via the stash-hit output.
  • Fix the previous || download="failed" && download="success" one-liner in the Download Stash step, which due to ||/&& precedence always reported success regardless of whether the download actually worked.
  • Move the retry / clean / fail-on-download logic out of inline bash into a stash/restore/download_stash.py module that exposes a single download_stash(env, run_download=...) function. The Download Stash step now runs it via shell: python3 {0}, matching the style of the existing Check for stash artifact step.
  • Add stash/restore/test_download_stash.py (unittest) with a FakeGh stub injected through the run_download parameter so the logic can be exercised offline. Cases covered:
    • success on first attempt
    • retry on exit 1 until success
    • all retries exhausted with fail-on-download=false (tolerated, exit 0)
    • all retries exhausted with fail-on-download=true (exit 1, error annotation)
    • non-1 exit codes are not retried
    • CLEAN=true removes the stash dir before download
    • CLEAN=false leaves the stash dir contents alone
    • mixed transient+fatal stops on the first non-1 code
  • Wire the new Python tests into the stash-action-test workflow next to the existing test_get_stash.py run, so they execute on ubuntu / windows / macOS.

Test plan

  • python3 stash/restore/test_download_stash.py passes locally (8/8 cases).
  • stash-action-test CI workflow runs the new Python tests on ubuntu / windows / macOS.
  • Trigger a workflow that uses stash/restore against a valid stash and confirm the artifact is downloaded and stash-hit is true.
  • Run with fail-on-download: true against a broken download and confirm the step fails with the error annotation.
  • Run with fail-on-download: false (default) against a broken download and confirm the step succeeds with stash-hit=false.

Supersedes #715.

Generated-by: Claude Opus 4.6 (1M context)

@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Apr 14, 2026

Tested in apache/airflow#65268 , linting fixed separately in #717

potiuk added a commit to potiuk/airflow that referenced this pull request Apr 14, 2026
Tracks the latest head of apache/infrastructure-actions#716 which adds
retry-count and fail-on-download inputs to stash/restore. No behaviour
change from the previous commit — just keeps the pinned SHA in sync.
@raboof raboof added the stash the stash action label Apr 15, 2026
@potiuk potiuk force-pushed the stash-restore-retry-download branch from fb12994 to 6d40001 Compare April 17, 2026 09:42
@potiuk
Copy link
Copy Markdown
Member Author

potiuk commented Apr 17, 2026

Can I get approval on that one ? Ths one will make our life way easier in Airflow if we can switch to the new version of airflow @dave2wave @dfoulks1

Copy link
Copy Markdown
Member

@dave2wave dave2wave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

potiuk added 4 commits April 18, 2026 09:24
Add retry-count input (default 3) that retries `gh run download` when it
exits with code 1, and fail-on-download input (default false) that makes
the step fail when the download did not succeed. Also disable errexit in
the Download Stash script so a single failing command cannot abort the
step, and fix the previous `|| download=failed && download=success`
one-liner that always reported success regardless of the download result.
Move the retry/fail-on-download logic out of action.yml into a
standalone download_stash.sh script so it can be exercised in
isolation, and add test_download_stash.sh which covers:

  - success on first attempt
  - retry on exit code 1 until success
  - all retries exhausted with fail-on-download=false (tolerated)
  - all retries exhausted with fail-on-download=true (step fails)
  - non-1 exit codes are not retried
  - CLEAN=true removes the stash dir before download
  - CLEAN=false leaves the stash dir contents alone
  - mixed transient+fatal stops on first non-1 code

The tests stub `gh` via a PATH shim so they run offline, and are
wired into the existing stash-action-test workflow alongside the
Python unit tests.
Replace the download_stash.sh / test_download_stash.sh pair with a
download_stash.py module whose retry, clean and fail-on-download
logic is exposed via a single download_stash() function. The Download
Stash step in action.yml now runs the module directly with
`shell: python3 {0}`, matching the style of the existing Check for
stash artifact step.

test_download_stash.py uses unittest and a FakeGh stub injected via
the run_download parameter, covering the same eight cases the bash
tests did (success, retry-until-success, exhausted retries tolerated,
exhausted retries with fail-on-download, non-transient exit not
retried, CLEAN true/false, mixed transient+fatal). The Python tests
replace the bash ones in the stash-action-test workflow.
Add stash/pyproject.toml modeled on pelican's: no runtime Python deps
(the action uses the `gh` and `jq` CLIs, which the action itself
checks for), a PEP 735 `dev` group with ruff/mypy/pytest for local
development, Hatch env wiring, and a ruff configuration.

Scope the "Linting and MyPy (Pelican)" workflow trigger from the
catch-all `**.py` / `**/linting.yml` / `**/pylintrc` to the pelican
subtree plus the workflow file itself. The workflow cds into pelican
and only knows how to lint that directory, so non-pelican Python
changes should not trigger it — this is what was making the lint job
on PR 716 fail (unrelated pelican install step).

Also fix one ruff finding in download_stash.py: import `Callable` and
`Mapping` from `collections.abc` instead of `typing` (UP035).
@potiuk potiuk force-pushed the stash-restore-retry-download branch from 6d40001 to b05768e Compare April 18, 2026 07:26
A recent zizmor-driven hardening pass added `persist-credentials: false`
to both actions/checkout steps in pelican-action-test.yml. The test
workflow drives the pelican action end-to-end, and the action's publish
path runs `git push` against the test-site working tree — without
persisted credentials that push fails with "could not read Username
for 'https://github.com'", turning the pelican-test job red.

Drop persist-credentials: false from the first checkout (the test-site
working tree that actually pushes). The second checkout into self/ is
only used to load the action code and keeps persist-credentials: false.

Generated-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@potiuk potiuk merged commit 7ebe16a into main Apr 18, 2026
18 checks passed
@potiuk potiuk deleted the stash-restore-retry-download branch April 18, 2026 07:56
potiuk added a commit to potiuk/airflow that referenced this pull request Apr 18, 2026
PR apache/infrastructure-actions#716 (retry-count and fail-on-download
inputs on stash/restore) is now merged. Bump the pinned SHA from the
old PR branch head to the current main tip. No behaviour change from
the previous commit — just keeps the pinned SHA in sync with upstream.
potiuk added a commit to apache/airflow that referenced this pull request Apr 18, 2026
…65268)

* Bump apache/infrastructure-actions stash to retry-capable revision

Bumps stash/restore and stash/save to the PR #716 revision which adds
retry-count (default 3) and fail-on-download inputs to stash/restore.

Sets fail-on-download: true on the two restore sites where the next
step (breeze ci-image load) cannot continue without the artifact:
prepare_breeze_and_image and prepare_single_ci_image. Other call sites
are true caches (cache-mount, docs inventory, node_modules, prek cache)
where a missed download only means a slower/cold run, so they keep the
default behaviour.

* Bump apache/infrastructure-actions stash to latest main

PR apache/infrastructure-actions#716 (retry-count and fail-on-download
inputs on stash/restore) is now merged. Bump the pinned SHA from the
old PR branch head to the current main tip. No behaviour change from
the previous commit — just keeps the pinned SHA in sync with upstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stash the stash action

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants