Remove flaky arm64 test job#10953
Conversation
|
@nishantmonu51 @martin-g @himanshug FYI since this is reverting a change that you all participated in. Any concerns with this? |
|
I will take a look at the failures at Monday!
…On Fri, Mar 5, 2021, 19:14 Suneet Saldanha ***@***.***> wrote:
@nishantmonu51 <https://github.com/nishantmonu51> @martin-g
<https://github.com/martin-g> @himanshug <https://github.com/himanshug>
FYI since this is reverting a change that you all participated in. Any
concerns with this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10953 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABYUQUPRE7PKNTSVUDLLF3TCEGP5ANCNFSM4YVSK6ZQ>
.
|
|
thanks, reducing transient failures is good, so it is ok to [temporarily] remove it since no issues have been filed specifically for things not working on arm64. so, +1 that said, I would let @martin-g take a crack at fixing this as there might be something systemic wrong and build failure might actually be a true positive. let us merge this towards the end of next week if things stay same. |
|
Nice catch. In my experience, this job 24 is a little bit flaky. This jobs often fails with |
|
According to https://www.howtobuildsoftware.com/index.php/how-do/b5CN/travis-ci-home-travis-buildsh-line-41-pid-killed-exit-code-137 error 137 means It is interesting that all the failures are in the build of the last module - |
Maybe we can move this job into |
|
I haven't used stages before in TravisCI. I don't see anything in .travis.yml that configures resources for the stages. |
|
OK, I see how TravisCI stages work! IMO it would be even better to move the ARM64 job into a third/new stage so that it does not affect the other jobs. |
|
https://docs.travis-ci.com/user/common-build-problems/#my-build-script-is-killed-without-any-error - the max memory per job is 3Gb. Line 44 in 9946306 |
|
I've created #10958. |
Thanks for the fix @martin-g! Since this job is still failing, I think it would be better to remove this job till we have a fix with some confidence that it will work. This way we can think through the fix fully instead of trying to rush the fix. I'll be sure to review your change as soon as it is ready so we can bring this test job back.
@zhangyue19921010 This job used to be in phase 1, but would fail and prevent all the integration tests from running. I moved it to phase 2 so that a committer wouldn't need to manually start every job in phase 2 if the phase 1 job is flaky. |
This kind of memory issue in CI requires a trials-and-errors type of experiments. You could try adjusting the max memory to fit in the container. Please check https://docs.travis-ci.com/user/reference/overview/ first and see how much memory the container has depending on the build environment setup.
Based on that it could take some time to fix this issue, +1 for temporarily disabling this particular test. |
|
Merging this PR as it blocks other PRs from getting merged. |
Documentation: - https://blog.travis-ci.com/2020-09-11-arm-on-aws - https://aws.amazon.com/blogs/opensource/getting-started-with-travis-ci-com-on-aws-graviton2/ Trying to fix the problem described at apache#10953
... because at the moment they took 1h which is a little bit above the TravisCI limit of 50mins per job and because @clintropolis requested to add one more module - sql
This removes a flaky test job that was added in #10562
The travis job was added to test building Druid on Arm64 architecture. No tests are actually run as part of the job.
However this job appears to fail around half of the time. My limited googling has not yielded any promising results. Since this impacts dev productivity, I propose we remove this job until we find out why this test fails so often and fix it appropriately.