-
Notifications
You must be signed in to change notification settings - Fork 1.4k
ci: add custom timeout to ci job in order to save resources #8504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: add custom timeout to ci job in order to save resources #8504
Conversation
|
/build |
Signed-off-by: Konstantinos <konstantinos.kitsios@uzh.ch>
40826bd to
4c85564
Compare
WalkthroughA timeout of 46 minutes was added to the Changes
Sequence Diagram(s)sequenceDiagram
participant GitHub_Actions
participant cron-conda Job
GitHub_Actions->>cron-conda Job: Start job
Note right of cron-conda Job: Job runs as usual
alt Job exceeds 46 minutes
cron-conda Job-->>GitHub_Actions: Job terminated (timeout)
else Job completes within 46 minutes
cron-conda Job-->>GitHub_Actions: Job completes successfully or fails normally
end
Assessment against linked issues
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/conda.yml (1)
24-24: Allow easier future adjustment of the timeout
Hard-coding46means touching every workflow file if runtime patterns drift. Consider parameterising the value via a YAML anchor or a reusable workflow input so it can be changed in one place, e.g.:+# workflow-level constant +timeouts: &cron_conda_timeout 46 ... - timeout-minutes: 46 # equal to max + 3*std over the last 600 successful runs + timeout-minutes: *cron_conda_timeout # equal to max + 3*std over the last 600 successful runs(or expose
timeout_minutesas an input if this job is moved to a reusable workflow).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/conda.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (18)
- GitHub Check: packaging
- GitHub Check: min-dep-pytorch (2.6.0)
- GitHub Check: min-dep-pytorch (2.5.1)
- GitHub Check: min-dep-py3 (3.11)
- GitHub Check: build-docs
- GitHub Check: min-dep-pytorch (2.4.1)
- GitHub Check: min-dep-py3 (3.9)
- GitHub Check: quick-py3 (macOS-latest)
- GitHub Check: flake8-py3 (codeformat)
- GitHub Check: min-dep-py3 (3.12)
- GitHub Check: flake8-py3 (pytype)
- GitHub Check: quick-py3 (ubuntu-latest)
- GitHub Check: quick-py3 (windows-latest)
- GitHub Check: flake8-py3 (mypy)
- GitHub Check: min-dep-os (windows-latest)
- GitHub Check: min-dep-py3 (3.10)
- GitHub Check: min-dep-os (macOS-latest)
- GitHub Check: min-dep-os (ubuntu-latest)
🔇 Additional comments (1)
.github/workflows/conda.yml (1)
24-24: Timeout added in the right place – looks good
timeout-minutesis correctly scoped at the job level, so each matrix leg gets the 46-minute cap and avoids 6-hour hangs. Comment clarifies the derivation.
|
/build |
Fixes #8503.
Description
Added custom timeout for
cron-condajob of thecron-condaworkflow based on historical data.More details
Over the last 633 successful runs, the
cron-condajob has a maximum runtime of 40 minutes (mean=23, std=2) across all matrix combinations.However, there are failed runs that fail after reaching the threshold of 6 hours that GitHub imposes. In other words, these jobs seem to get stuck, possibly for external or random reasons.
One such example is this job run, that failed after 6 hours. More stuck jobs have been observed over the last six months, the first one on 11-Jan-2025 and the last one one on 17-Apr-2025, while more recent occurences are also possible because our dataset has a cutoff date around late May. With the proposed changes, a total of 145 hours would have been saved over the last six months retrospectively, clearing the queue for other workflows and speeding up the CI of the project, while also saving resources in general 🌱.
The idea is to set a timeout to stop jobs that run much longer than their historical maximum, because such jobs are probably stuck and will simply fail with a timeout at 6 hours.
Our PR proposes to set the timeout to
max + 3*std = 46 minuteswheremaxandstd(standard deviation) are derived from the history of 633 successful runs. This will provide sufficient margin if the workflow gets naturally slower in the future, but if you would prefer lower/higher threshold we would be happy to do it.Note that the timeout applies to all the matrix jobs, and not to their sum, overriding the default 6-hour timeout of github.
Context
Hi,
We are a team of researchers from University of Zurich and we are currently working on energy optimizations in GitHub Actions workflows.
Thanks for your time on this.
Feel free to let us know (here or in the email below) if you have any questions, and thanks for putting in the time to read this.
Best regards,
Konstantinos Kitsios
konstantinos.kitsios@uzh.ch
A few sentences describing the changes proposed in this pull request.
Types of changes
Summary by CodeRabbit