-
Notifications
You must be signed in to change notification settings - Fork 307
Description
CI Optimization — Job Timeout Protection
Summary
Of the 27 CI jobs, only security-scan had an explicit timeout-minutes. The remaining 26 active jobs relied on GitHub's 6-hour default, meaning a hanging test, tool download, or Docker operation could silently consume 6 hours of runner compute before being killed.
This PR adds conservative timeout-minutes to all 26 jobs that lacked them.
Optimization
Add timeout-minutes to 26 CI jobs
Type: Resource Right-Sizing / Safety
Impact: Prevents 6-hour compute waste from hanging jobs; faster failure feedback
Risk: Low — all timeouts are 2-3× the observed maximum run duration
Timeout values assigned:
| Timeout | Jobs |
|---|---|
| 10 min | validate-yaml, js, js-integration-live-api, update, actions-build, lint-js, check-validator-sizes, health-smoke-copilot, mcp-server-compile-test, safe-outputs-conformance |
| 15 min | test, build, build-wasm, canary_go, bench, audit, security, integration-unauthenticated-add, integration-add-dispatch-workflow |
| 20 min | lint-go, fuzz, cross-platform-build, alpine-container-test |
| 25 min | integration (matrix — test step has -timeout=10m) |
| 30 min | integration-add (clones external repo + adds N workflows) |
Rationale: All values are well above observed run times to avoid false timeouts, while protecting against the 6-hour default.
Analysis Baseline (last 100 CI runs)
View CI Metrics
- Total runs analyzed: 100
- Success rate: 69%
- Cancelled: 19% (expected — PRs supersede previous runs with
cancel-in-progress: true) - Failures: 7%
- Average successful run: 5.9 minutes
- Slowest integration group: CLI MCP Connectivity (~108s test time)
- Test coverage: 100% (4731/4731 tests executed, verified by canary_go)
The CI is already well-optimized:
- All jobs run in parallel (minimal
needsdependencies) - Integration test matrix is well-balanced
- Cache hit rates appear high (retry blocks conditional on
cache-hit != 'true')
Validation Results
- ✅ YAML syntax validated (
python3 -c "import yaml; yaml.safe_load(...)") - ✅ Changes limited to
timeout-minutesadditions only — no logic changes
Testing Plan
- Monitor first few runs after merge to confirm no false timeouts
- If any job hits its timeout unexpectedly, increase the specific value
Proposed by CI Coach workflow run #113
Generated by CI Optimization Coach · ◷
- expires on Mar 20, 2026, 1:51 PM UTC
Warning
🛡️ Protected Files — Push Permission Denied
This was originally intended as a pull request, but the patch modifies protected files: .github/workflows/ci.yml.
The push was rejected because GitHub Actions does not have workflows permission to push these changes, and is never allowed to make such changes, or other authorization being used does not have this permission. A human must create the pull request manually.
To create a pull request with the changes:
# Download the patch from the workflow run
gh run download 23247175607 -n agent-artifacts -D /tmp/agent-artifacts-23247175607
# Create a new branch
git checkout -b ci-coach/add-job-timeouts-run-113-02f1929e38153919 main
# Apply the patch (--3way handles cross-repo patches)
git am --3way /tmp/agent-artifacts-23247175607/aw-ci-coach-add-job-timeouts-run-113.patch
# Push the branch and create the pull request
git push origin ci-coach/add-job-timeouts-run-113-02f1929e38153919
gh pr create --title '[ci-coach] ci: add timeout-minutes to all jobs lacking explicit limits' --base main --head ci-coach/add-job-timeouts-run-113-02f1929e38153919 --repo github/gh-aw