-
-
Notifications
You must be signed in to change notification settings - Fork 782
Various fixes so our code also works with Travis Xenial VM image #4504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
As mentioned on Slack - we need to figure out a path going forward with Travis CI. It has simply been too slow and unstable for us for the last half a year or so.
The same test case takes 14 seconds locally. We haven't done anything major in the code which should affect build times in such a negative way. Build times on Travis have been slowly growing from 15 minutes (spent a lot of time getting it to that number a half a year or so ago) to 50+ minutes now. We don't have much visibility on what is going on Travis, but I would imagine they changed VM size to some tiny one for old VM / sudo required builds or something similar. EDIT: I some how missed the memo - https://blog.travis-ci.com/2018-10-04-combining-linux-infrastructures. We never used container based infrastructure, but it looks like this change could some how affect build timings in a negative way. |
arm4b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to see how infrastructure type change in TravisCI affects the build timings, keeping any other changes in tests minimal for fair comparison.
Previously switching to container-based infra slowed down the build times x1.5-x2.
It's good to experiment here.
I'm 👍 to reassess if timing improves significantly.
|
Alright, I made changes to There are still issues with intermediate failures and slow build times for unit tests which seem to be related to Travis environment (aka nothing we can do at the moment). |
It turns out that on AMQP publish on Travis, "Channel.open: (504) CHANNEL_ERROR - second 'channel.open' seen" error is returned almost on every publish which causes our connection retry wrapper to sleep for 10 seconds and slowing down the tests. I haven't been able to track down the root cause if this issue yet, but this workaround should work fine and cause no issues on production.
Container infrastructure has been removed and apt addon works fine with new and old infrastructure.
duplicate dependency installation.
test-requirements.txt and requirements.txt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RE precise: https://docs.travis-ci.com/user/reference/precise/.
They're not removing it completely, just don't update with new/security.
Talking about timing. New setup increases overall build timing significantly from 22 to 29mins. I'd say anything that takes 30mins+ in CI tests is counter-productive.
You want to keep the parallel tasks in CI job aligned in timing. The longest task will make the most vote in entire build time aka wait time. This way having new 29min as longest task is much worse comparing to old 22min.
Even if other tasks went faster, you have to wait for longest 2:
And you'll keep adding more and more Unit tests in future which will increase "overall" timing unproportionally high.
By these reasons previously I split Unit tests into 2 pieces to keep the overall build timing under 10mins and set it to precise as fastest Travis Unit tests performer.
Overall, this looks like a tradeoff: being up to date and slower VS outdated and faster.
As said before, I don't think distro makes much sense for Unit tests, keeping in mind that all you need is isolated pinned python version.
Is there any other important point I'm missing?
I'll keep this information here for consideration.
I think more 👀 will be beneficial in this PR, especially someone who works with the repo itself in day-to-day basis like @bigmstone and @m4dcoder.
|
@armab I agree. As long as it's still supported / available on Travis I'm fine with using Precise if the builds indeed finish faster (as you have pointed out, we would still need to wait for the longest build). There are still issues with using old version of various components and dependencies (e.g. OpenSSL which ships in Precise won't be supported in newer version of Good news also is that now our code and Travis config also supports Xenial (just need to uncomment and change a couple of lines in Travis config) so if we are ever forced to switch to it, the switch will be easy and fast. On a related note - I haven't been able to figure out why unit tests are slower but integration ones are faster on Xenial. VMs should have the same amount of resources available to them (RAM and CPU). |
arm4b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding new Travis infra compatibility works for me, while being on fastest but oldest infra.
Probably need to edit PR description as well to reflect updated change status?
|
@armab I will just add a comment here about us deciding to stick to Precise due to the speed reasons. For transparency reasons I usually try to leave the original description and discussion alone and evolve the PR via comments. |

This pull request updates travis config to use ubuntu xenial instead of (deprecated a long time ago) precise.Update: Decided to stick to Precise due to the speed reasons, but add Xenial compatibility in TravisCI config for future.
I'm just experimenting to see if this will affect tons of intermediate build failures we have been seeing in the last couple of days.