Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Restrict Ubuntu.1404.Arm32.Open from running against PRs#23523

Merged
echesakov merged 1 commit intodotnet:masterfrom
echesakov:RestrictUbuntu1404Arm32Open
Mar 28, 2019
Merged

Restrict Ubuntu.1404.Arm32.Open from running against PRs#23523
echesakov merged 1 commit intodotnet:masterfrom
echesakov:RestrictUbuntu1404Arm32Open

Conversation

@echesakov
Copy link
Copy Markdown

@echesakov echesakov commented Mar 28, 2019

Disable Linux/arm32 testing for now to avoid "Test Linux arm32" jobs being cancelled in each PR
(e.g. #22255 (comment))

The problem with this queue that it can't handle all the work items coming from PRs, push triggered builds and JitStress scheduled builds. Re-trying build from the AzDO page are making this even worse since they newly started jobs add new work items to the queue without removing already submittted ones.

The long term solution should be Helix being able to remove scheduled jobs if a corresponing AzDO build has been cancelled. In other words, if no one expects to see the test results coming from Helix submission why bother running them? @MattGal probably knows about the issue - but I don't know when the feature is going to be implemented.

A potential short term solution could be switching to Docker based Arm32 queue, but, as far as I remember, @jashook had some objections of doing this (at least it was my impression).

I am personally fine with an alternative when one of these queues are used exclusively for PR triggered builds and others queues are being utilized to run JitStress or CI builds.

cc @dotnet/jit-contrib

@BruceForstall
Copy link
Copy Markdown

I don't recall our Jenkins CI having any capacity issues with these jobs recently. Why is AzDO stressing it more that Jenkins did? Are there more jobs running than before? Is it because Helix is using all the machines at once, and the overhead of the split jobs is much higher than running all Pri-0 jobs on a single host?

@echesakov
Copy link
Copy Markdown
Author

@BruceForstall I think the problem is that now when you submit jobs to Helix from AzDO job your build is timeout-ed (after roughly 3 hours for pri0) if the jobs haven't been completed by that time while before Jenkins jobs were indefinitely waiting for the job to start running. When your AzDO job is timeout-ed you click this re-try button, effectively, submitting even more jobs to the queue (without removing previosly submitted ones) - you job are get cancelled again - you re-try again.

@echesakov echesakov merged commit d1f914c into dotnet:master Mar 28, 2019
@echesakov echesakov deleted the RestrictUbuntu1404Arm32Open branch March 28, 2019 20:26
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants