Skip to content

Conversation

@kitsiosk
Copy link
Contributor

@kitsiosk kitsiosk commented Jul 2, 2025

Fixes #8503.

Description

Added custom timeout for cron-conda job of the cron-conda workflow based on historical data.

More details

Over the last 633 successful runs, the cron-conda job has a maximum runtime of 40 minutes (mean=23, std=2) across all matrix combinations.

However, there are failed runs that fail after reaching the threshold of 6 hours that GitHub imposes. In other words, these jobs seem to get stuck, possibly for external or random reasons.

One such example is this job run, that failed after 6 hours. More stuck jobs have been observed over the last six months, the first one on 11-Jan-2025 and the last one one on 17-Apr-2025, while more recent occurences are also possible because our dataset has a cutoff date around late May. With the proposed changes, a total of 145 hours would have been saved over the last six months retrospectively, clearing the queue for other workflows and speeding up the CI of the project, while also saving resources in general 🌱.

The idea is to set a timeout to stop jobs that run much longer than their historical maximum, because such jobs are probably stuck and will simply fail with a timeout at 6 hours.

Our PR proposes to set the timeout to max + 3*std = 46 minutes where max and std (standard deviation) are derived from the history of 633 successful runs. This will provide sufficient margin if the workflow gets naturally slower in the future, but if you would prefer lower/higher threshold we would be happy to do it.

Note that the timeout applies to all the matrix jobs, and not to their sum, overriding the default 6-hour timeout of github.

Context

Hi,

We are a team of researchers from University of Zurich and we are currently working on energy optimizations in GitHub Actions workflows.

Thanks for your time on this.

Feel free to let us know (here or in the email below) if you have any questions, and thanks for putting in the time to read this.

Best regards,
Konstantinos Kitsios
konstantinos.kitsios@uzh.ch

A few sentences describing the changes proposed in this pull request.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).

Summary by CodeRabbit

  • Chores
    • Set a 46-minute timeout for the scheduled Conda workflow to improve reliability.

@kitsiosk kitsiosk requested a review from KumoLiu as a code owner July 2, 2025 14:14
@KumoLiu
Copy link
Contributor

KumoLiu commented Jul 4, 2025

/build

@KumoLiu
Copy link
Contributor

KumoLiu commented Jul 4, 2025

Hi @kitsiosk, could you please signoff based on the guide here to let the DCO pass? Thanks.

kitsiosk added 2 commits July 4, 2025 16:53
Signed-off-by: Konstantinos <konstantinos.kitsios@uzh.ch>
Signed-off-by: Konstantinos <konstantinos.kitsios@uzh.ch>
@kitsiosk kitsiosk force-pushed the 8503-add-custom-ci-timeout branch from 40826bd to 4c85564 Compare July 4, 2025 14:53
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 4, 2025

Walkthrough

A timeout of 46 minutes was added to the cron-conda job in the GitHub Actions workflow configuration file .github/workflows/conda.yml. This change limits the maximum runtime for each matrix job in the workflow, based on historical run statistics. No other modifications were made.

Changes

File(s) Change Summary
.github/workflows/conda.yml Added timeout-minutes: 46 to cron-conda job.

Sequence Diagram(s)

sequenceDiagram
    participant GitHub_Actions
    participant cron-conda Job

    GitHub_Actions->>cron-conda Job: Start job
    Note right of cron-conda Job: Job runs as usual
    alt Job exceeds 46 minutes
        cron-conda Job-->>GitHub_Actions: Job terminated (timeout)
    else Job completes within 46 minutes
        cron-conda Job-->>GitHub_Actions: Job completes successfully or fails normally
    end
Loading

Assessment against linked issues

Objective Addressed Explanation
Add a custom timeout of 46 minutes to the cron-conda CI job (#8503)
Timeout should apply to all matrix jobs, overriding the default 6-hour GitHub timeout (#8503)

Poem

A hop and a skip, a timeout appears,
Forty-six minutes to quiet our fears.
No more jobs stuck in a digital bog,
Saving some time for each busy log.
The CI queue cheers, the rabbits all clap—
Resourceful and swift, we close the gap!
🕒🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/conda.yml (1)

24-24: Allow easier future adjustment of the timeout
Hard-coding 46 means touching every workflow file if runtime patterns drift. Consider parameterising the value via a YAML anchor or a reusable workflow input so it can be changed in one place, e.g.:

+# workflow-level constant
+timeouts: &cron_conda_timeout 46
 ...
-    timeout-minutes: 46 # equal to max + 3*std over the last 600 successful runs
+    timeout-minutes: *cron_conda_timeout # equal to max + 3*std over the last 600 successful runs

(or expose timeout_minutes as an input if this job is moved to a reusable workflow).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d388d1c and 4c85564.

📒 Files selected for processing (1)
  • .github/workflows/conda.yml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (18)
  • GitHub Check: packaging
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: build-docs
  • GitHub Check: min-dep-pytorch (2.4.1)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: min-dep-os (ubuntu-latest)
🔇 Additional comments (1)
.github/workflows/conda.yml (1)

24-24: Timeout added in the right place – looks good
timeout-minutes is correctly scoped at the job level, so each matrix leg gets the 46-minute cap and avoids 6-hour hangs. Comment clarifies the derivation.

@kitsiosk
Copy link
Contributor Author

kitsiosk commented Jul 9, 2025

Hi @kitsiosk, could you please signoff based on the guide here to let the DCO pass? Thanks.

Hi @KumoLiu, done!

@KumoLiu
Copy link
Contributor

KumoLiu commented Jul 10, 2025

/build

@KumoLiu KumoLiu enabled auto-merge (squash) July 10, 2025 02:42
@KumoLiu KumoLiu merged commit 4b69748 into Project-MONAI:dev Jul 10, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Custom Timeout for cron-conda CI Job to Save Resources

2 participants