COMP: Free disk on Linux.Python CI before post-job ccache tar#6199
Merged
dzenanz merged 1 commit intoInsightSoftwareConsortium:mainfrom May 4, 2026
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
ITK.Linux.Python builds 14748 and 14751 (and presumably any
subsequent run) fail at the implicit `Cache@2` post-job step that
tars `CCACHE_DIR` for upload, with
tar: <archive>: Wrote only 4096 of 10240 bytes
tar: Error is not recoverable: exiting now
##[error]Process returned non-zero exit code: 2
after 10+ `Free disk space on / is lower than 5%` warnings. The
build/test phase itself is green; only the cache upload fails.
The local ccache fills its 5G default ceiling during the build:
Cache size (GB): 4.94 / 5.00 (98.87 %)
Cleanups: 15
and the runner's writable disk is already 96% full when `tar` starts,
leaving no room for the staging archive. Worse, because the upload
fails every run, the pipeline cache key reports a miss every time, so
the next run is also cold and refills the local store from scratch.
Insert a between-step that runs after `Build and test` and before the
implicit post-job `Restore ccache` upload:
- `df -h /` for visibility on both sides of the cleanup;
- `ccache --evict-older-than 5d` to drop stale entries;
- `ccache --max-size 6.5G` to lower the soft ceiling;
- `ccache -c` to force the store under that ceiling;
- `ninja -C <build> clean` to free the .o files that are no longer
needed once the build/test phase has reported (`|| true` so a
missing/already-clean tree does not fail the step).
Setting max-size to 6.5G (above the in-build CCACHE_MAXSIZE=8G env
but below the runner's free-disk headroom) gives ccache room to
operate during the build while keeping the post-job tar inside the
runner's disk budget.
817fd27 to
ea4ba99
Compare
dzenanz
approved these changes
May 4, 2026
Member
dzenanz
left a comment
There was a problem hiding this comment.
I have noticed the "low disk space" warning. This is at least worth trying.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes a recurring
ITK.Linux.Pythonpost-jobtar: Error is not recoverable(ENOSPC) by inserting a disk-cleanup step between Build and test and the implicitCache@2post-job uploader. Builds 14748 and 14751 both hit this; build/test phases were green, only the cache upload failed.Root cause (deep dive on build 14751)
The
ccache statsstep before the post-job prints:Local ccache is at the soft ceiling and has self-evicted 15 times during one build. The post-job
Cache@2task then tries totarthe entire 4.94 GB ccache for upload on a runner whose/is already at 96% used:Every run also reports
There is a cache miss.for theccache-v4|Linux|LinuxPython|<sha>fingerprint, so the next run is cold too — the pipeline is stuck in a self-perpetuating broken state.What this PR adds
A new step in
Testing/ContinuousIntegration/AzurePipelinesLinuxPython.yml, inserted afterBuild and testand beforeccache stats:ccache --evict-older-than 5ddrops stale entries that no longer match recent compiler/source SHAs.ccache --max-size 6.5Glowers the local ceiling. Combined withccache -c(cleanup) this forces the local store under the new ceiling so the tar fits in the runner's available disk.ninja -C <build> cleanremoves the .o files that are no longer needed after the build/test phase reported.|| truelets the step succeed if the tree is already clean.df -h /calls give visibility on both sides of the cleanup so future regressions are diagnosable from logs alone.condition: always()so even a failing build/test phase still gets the post-job a free runway.Why 6.5G specifically
The build-time env still has
CCACHE_MAXSIZE: 8Gso ccache has room to operate during the build. The post-job ceiling of 6.5G is below the runner's typical free-disk headroom at the end of a Python wrapping build (the build itself consumes ~30-35 GB across/home/vsts/workfor sources + build tree + ExternalData + ccache combined). Lowering further would unnecessarily evict warm entries; leaving at 8G would not actually fix the immediate failure. 6.5G is the sweet spot.Doesn't address (and shouldn't, in this PR)
ccache-v4|...|<Build.SourceVersion>cache fingerprint is too coarse / too fine. Separate optimization.CCACHE_MAXSIZE: 8Genv at the Build and test step actually takes effect (logs show ccache reporting a 5G ceiling there — possibly a persistent ccache config overrides). Worth a follow-up but not blocking this fix.ITK.macOS.Python(build 14750 currently green, but the same growth trajectory could hit it).