Problem
The wiki automation currently has two connected reliability gaps.
First, Retry Transient Workflow Failures can fail instead of helping when the failed run comes from wiki maintenance. On 2026-04-30, retry run 25150628268 failed while inspecting Maintain Wiki run 25150613235 and raised:
Failed to download logs for job maintenance / Publish Wiki Master: 401 Unauthorized
That leaves transient wiki failures unretried and turns the safety net itself into another failing workflow.
Second, wiki publication is still vulnerable to pointer races around release merges. A pull request can compete with wiki pointer publication and still merge, while the release publication path is the authoritative moment that should guarantee the final .github/wiki pointer matches the released state.
Current Behavior
.github/workflows/retry-transient-failures.yml throws when it cannot download logs for a failed job exposed through a reusable workflow boundary, such as maintenance / Publish Wiki Master from Maintain Wiki.
- Because the retry workflow itself fails, maintainers get no rerun decision and no useful summary for that failure.
- Wiki publication can still leave a stale
.github/wiki pointer behind when preview/publication activity and merge timing overlap near a release.
Expected Behavior
- The transient retry workflow should treat unreadable child-job logs as a handled condition and finish with a deterministic summary instead of crashing.
- Wiki maintenance failures should still be retried when logs are available and every failed job matches the transient GitHub-side signatures.
- Release publication should reassert the final wiki pointer from the authoritative released state so concurrency with merged pull requests cannot leave the wiki pointer stale.
Failure Surface
.github/workflows/retry-transient-failures.yml
.github/workflows/wiki-maintenance-entry.yml
.github/workflows/wiki-maintenance.yml
- The merged release publication path in
.github/workflows/changelog.yml
Proposal
Make the transient retry workflow robust when GitHub refuses job-log downloads for reusable-workflow jobs, and add a release-side wiki reconciliation step that guarantees the final pointer is revalidated or republished from the authoritative release state.
Implementation Strategy
- Isolate the retry workflow's failed-job inspection so it can distinguish inspectable failed jobs from jobs whose logs are unavailable because of GitHub API limitations or nested workflow boundaries.
- Keep retry decisions deterministic: when a failed job cannot be inspected, emit an explicit summary status instead of throwing.
- Preserve the existing transient-signature matching for inspectable jobs.
- Add a bounded release-publication safety net that reruns or revalidates wiki pointer publication against the final release state on
main, so the release path can correct any stale pointer left behind by concurrent pull request merges.
- Add or update focused coverage for reusable-workflow retry handling and for release-side wiki pointer reconciliation.
Non-goals
- Redesigning the entire wiki workflow architecture.
- Moving wiki automation into another repository.
- Broad refactors to unrelated reports, Pages, or release automation.
Acceptance Criteria
Functional Criteria
Regression Criteria
Architectural / Isolation Criteria
- MUST: The core logic MUST be isolated into dedicated classes or services instead of living inside command or controller entrypoints.
- MUST: Responsibilities MUST be separated across input resolution, domain logic, processing or transformation, and output rendering when the change is non-trivial.
- MUST: The command or controller layer MUST act only as an orchestrator.
- MUST: The implementation MUST avoid tight coupling between core behavior and CLI or framework-specific I/O.
- MUST: The design MUST allow future extraction or reuse with minimal changes.
- MUST: The solution MUST remain extensible without requiring major refactoring for adjacent use cases.
- MUST: Data gathering or transformation MUST be isolated from filesystem writes or publishing steps.
- MUST: Generated output ordering and formatting MUST remain deterministic across runs.
- MUST: Re-running the workflow MUST be idempotent or clearly bounded in its side effects.
Problem
The wiki automation currently has two connected reliability gaps.
First,
Retry Transient Workflow Failurescan fail instead of helping when the failed run comes from wiki maintenance. On 2026-04-30, retry run25150628268failed while inspectingMaintain Wikirun25150613235and raised:That leaves transient wiki failures unretried and turns the safety net itself into another failing workflow.
Second, wiki publication is still vulnerable to pointer races around release merges. A pull request can compete with wiki pointer publication and still merge, while the release publication path is the authoritative moment that should guarantee the final
.github/wikipointer matches the released state.Current Behavior
.github/workflows/retry-transient-failures.ymlthrows when it cannot download logs for a failed job exposed through a reusable workflow boundary, such asmaintenance / Publish Wiki MasterfromMaintain Wiki..github/wikipointer behind when preview/publication activity and merge timing overlap near a release.Expected Behavior
Failure Surface
.github/workflows/retry-transient-failures.yml.github/workflows/wiki-maintenance-entry.yml.github/workflows/wiki-maintenance.yml.github/workflows/changelog.ymlProposal
Make the transient retry workflow robust when GitHub refuses job-log downloads for reusable-workflow jobs, and add a release-side wiki reconciliation step that guarantees the final pointer is revalidated or republished from the authoritative release state.
Implementation Strategy
main, so the release path can correct any stale pointer left behind by concurrent pull request merges.Non-goals
Acceptance Criteria
Functional Criteria
Retry Transient Workflow Failuresno longer fails when a failed job belongs to a reusable workflow and GitHub does not allow direct log download for that child job.Maintain WikiandMaintain Wiki Publicationfailures are still retried when logs are available and every failed job matches the configured transient GitHub-side signatures..github/wikipointer from the authoritative released state so concurrent preview/publication activity cannot leave a stale pointer behind.Regression Criteria
Architectural / Isolation Criteria