-
Notifications
You must be signed in to change notification settings - Fork 36
CI Enhancement Proposal #32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jmccormick2001
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be curious as to what could be accomplished in the short-term, say within 30 days, to improve the CI performance, some are longer term as you suggest. maybe a breakdown of short-term versus longer-term?
|
another question: would it be possible to simulate the performance gains for breaking the CI into separate repos? |
|
what are your thoughts on measuring CI performance? if we move to Gitlab, is there a way to compare Travis performance versus what we get in Gitlab for the same workloads? |
|
|
||
| Travis passes but GitHub does not allow merging | ||
|
|
||
| Goveralls requires rebasing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Goveralls requires rebasing |
It is not a problem any more. See: operator-framework/operator-sdk#3158
|
|
||
| General flakes and failures | ||
|
|
||
| Travis passes but GitHub does not allow merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this happened a few times but i think it is important to emphasize that this problem is intermittent and is not frequent.
|
|
||
| The primary issues are the following: | ||
|
|
||
| Extremely slow TravisCI runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Extremely slow TravisCI runs | |
| - Extremely slow TravisCI runs |
Could we add the itens using the list items mark from markdown? Also, can we remove the space between each item?
|
|
||
| Unit test coverage is subpar | ||
|
|
||
| Increased unit testing would also allow us to alleviate some of the e2e testing load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the above item shows a solution. So, should not be in bellows of The primary issues are the following:
estroz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is going to be in the enhancements repo, it needs to follow the template. I'm not 100% sure it does need to exist here since it only really affects the development process of the SDK; these changes don't really pertain to anything user-facing.
|
I believe we should wait on separating repos for (Ansible,Helm, and Go), before Enhancing CI. As the e2e tests can be separated out as well. |
|
|
||
| # Proposed Changes | ||
|
|
||
| The first item to examine is our CI platform. After basic research I found that GitLab CI has a high bandwidth option available for free for open source applications. This would allow us to run more tests concurrently and avoid some of the bottlenecks found in travis. Additionally each concurrent process has higher computer power available to it. My suggestion would be to dry run GitLab CI for 30 days (within a trial period) to gain knowledge on whether or not our testing speed is faster and/or more reliable. This period would also allow us time to further improve the CI on the software side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After basic research I found that GitLab CI has a high bandwidth option available for free for open source applications. {...} Additionally, each concurrent process has higher computer power available to it
Could you please provide a technical comparison between both (Travis X GitLab)? I mean how much Travis provide and how much GitLab provides?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what are the problems that we have now that would be solved with? Could we describe from the list what items would be sorted out?
| # Proposed Changes | ||
|
|
||
| The first item to examine is our CI platform. After basic research I found that GitLab CI has a high bandwidth option available for free for open source applications. This would allow us to run more tests concurrently and avoid some of the bottlenecks found in travis. Additionally each concurrent process has higher computer power available to it. My suggestion would be to dry run GitLab CI for 30 days (within a trial period) to gain knowledge on whether or not our testing speed is faster and/or more reliable. This period would also allow us time to further improve the CI on the software side. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion would be to dry run GitLab CI for 30 days (within a trial period)
Can we just switch to GitLabCI? Would not it require a specific syntax/setup? Also, if be easy/low effort the setup could we not just configure a fork a push a PR to check how long will take and compare?
Move to use GitLab do not means get out from GitHub? Would not it mean to move the repo from github.com to gitlab.com? If yes, then I think we need to consider other aspects as well.
|
|
||
| Goveralls requires rebasing | ||
|
|
||
| Random websites which are called during docs checks are down and fail entire runs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the problem that we have here is: intermittent timeout issues faced in the doc checks.
See that if the doc or sanity fails we do not run the other tests to optimize the usage of Travis resources since it means that we need to do a change in the code to fix it. E.g broken link, missing licence in a new file, lint code issues such as dead code.
| Goveralls requires rebasing | ||
|
|
||
| Random websites which are called during docs checks are down and fail entire runs | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some ideas that might help you with:
Performance Issue
Note that can take upwards of 30mins for a single set of e2e tests to execute.
Unit test coverage of the project is low
May would be possible to decrease the quantity of e2e tests which are the root cause of the performance issues and the reason for the tests take to long to be executed.
E2E test done with shell script
Since the e2e test are done in the shell is hard to troubleshooting and because of this, we cannot use coveralls to check what is or not covered with them.
NOTE> Open question here. by using env test and the modules "github.com/onsi/ginkgo" and ."github.com/onsi/gomega" we will able to troubleshooting/debug the e2e tests?
Low Manutence Ability
The tests do not follow a single standard which makes harder we kept then maintained.
General flakes and failures (intermittent and not frequent)
- Travis passes but GitHub does not allow merging
- Timeout issues and 404 errors during docs checks
|
|
||
| Each test is a separate function or script | ||
|
|
||
| Currently many tests (sometimes unrelated) are jammed together in a single script which is then difficult to identify problem areas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can be a little more specific here. It might happen naturally by following @estroz suggestion to use the template. 👍
|
|
||
| # Motivations | ||
|
|
||
| The primary motivation for these enhancements is to improve developer productivity. Much time is lost in slow CI runs, inconsistent flakes and failures in CI, and a general difficulty in debugging said CI runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would note that another big step to improving productivity would be to make it easy, it at least possible, to run specific tests or set of tests in a way that accurately mimics the CI environment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say that below but I still think it's a major motivation
|
|
||
| Increased unit testing would also allow us to alleviate some of the e2e testing load | ||
|
|
||
| Testing is stitched together across a number of bash scripts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in those scripts infra setup and actual test execution are completely muddled together, which makes it a nightmare to run tests against a specific, existing clustet
|
|
||
| The first item to examine is our CI platform. After basic research I found that GitLab CI has a high bandwidth option available for free for open source applications. This would allow us to run more tests concurrently and avoid some of the bottlenecks found in travis. Additionally each concurrent process has higher computer power available to it. My suggestion would be to dry run GitLab CI for 30 days (within a trial period) to gain knowledge on whether or not our testing speed is faster and/or more reliable. This period would also allow us time to further improve the CI on the software side. | ||
|
|
||
| The second change to implement would be increasing unit test coverage of the SDK CLI and scaffolding. The more of the SDK we can unit test, the less pathways need to be checked in e2e testing. In general we are able to execute unit tests much faster than e2e tests. There is already some unit testing in place and the action item for this would be to simply ensure we have 100% coverage of all of our possible CLI code paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this to a point, but I think we shouldn't make the e2e tests cover less. What would make sense is to have good unit tests and gate the e2e tests on their success, allowing builds that we know are bad early to waste less time
|
|
||
| Allows for more clarity when debugging e2e tests | ||
|
|
||
| Allows for a single e2e test format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the range of technologies we support, I'd just want to make sure this isn't too prescriptive. Rewriting Ansible molecule tests to support an arbitrary format could be non-trivial, and testing Ansible without molecule would be a lot more non-trivial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we also need to add the definition for the common cases (fewer exceptions described by @fabianvf above) that all test would be made with ginkgo and gomega.
And that e2e will use the kb utils(env test)as the tests for the new layout: https://github.com/operator-framework/operator-sdk/tree/master/test/e2e-new
|
Obsolete. |
No description provided.