-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Integrate the run coverage command to the CI #37152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2d22593 to
8ffd3c5
Compare
a647e52 to
2114b8b
Compare
2114b8b to
7cf2756
Compare
|
This is not likely to succeed.. The nature of tests we have - where we gather coverage and combine them from separate runs is such that every single run will get smaller coverage than the "combined" one. What we are really after most likely is to have the coverage bot to comment back (and pottentially mark PR as not-green when the coverage drops. We used to have such coverage bot but we disabled it, but maybe it's time to re-enable it since we started back to look at coverage. |
I think the current issue is around Files for DB-tests and non-dbtests. I will try more and see if it will solve. |
2492697 to
65952ea
Compare
|
It looks like it worked @potiuk. Failure will ask the developer to update tests or remove the file from the list of not fully covered. Though I suspect this is an additional headache, I think it will be more useful in that it points to specific files to improve or that have dropped in coverage. |
|
This is nice. I like the idea that we are focusing on coverage of particular test files - because since we are doing selective tests, this is the only way to get it somewhat close to full coverage. It's not ideal - because sometimes for example - we will get coverage in API by running a different test, but this is actually a nice way to find tests that are put in a wrong folder and we can figure out that one later. There are however a few small problems I see (but I thin we can figure out how to solve them):
This is is solvable - by producing coverage files, uploading them as artefacts and then have a separate job which will download all the coverage files, combine them and only after that will report success/failure. Most of the code of your will woirk - but it could be simply separate job run after everything we've done. We do a very similar thing now already for "summarize warning" - it is done for the very same reason. All the tests are producing warning files and then we have a single job that pulls all of them, deduplicates and summarizes them. This also has a very nice effect - individual tests will not be failing. This is scary to see your tests failing - you need to take a look at the output to find out that this is coverage, rather than actual test failure. Having a aseparate job is way better - because that job can be name "Coverage Check" or similar and when it fails, you will immediately know that it's the coverage that failed. Plus we can make that single job non-fatal - i..e. failure in coverage will not mean that the whole job is failed (we do that for Quarantine tests) - at least initially when we will be figuring out all quiirks and problem, we can accept the job faiiling for a while until we find and solve all the imperfectness we will see from people running actual PRs. Or initially - for testing we could run it without failing at all - just having a summary and have a look and see how it works for various PRs. Then gradually we could make it fail (but non -fatal) and eventually make it a "must" to merge PR to have it green, once we are sure that there are very few false negatives.
I think it woud be good (and this is not a very complex thing) to have a way to show the diff of coverage for the files that have problem: before / after. I think that can be done nicely with generating right URL in the remote codecov - when we upload the coverage from the test, there is a way to see it and even see the diff I think. Seeing just "coverage dropped" and not giving the user a clue how to check it is bad - seeing URL will help. Also there is this GitHub Action from Codecov which we could use. I think such approach might be way better than what the Github Action integration provides by default with their actions they post some comments in PR with summary etc. but I find it far too spammy - having a separate JOB with "code coverage" information, indication of problem (by status) and list of files with problems with an URL to follow is a way better thing.
I think eventually rather than full coverage, we could turn it into coverage drop and this way we could increase the number of files we could monitor easily. But that's another story. |
|
I like the suggestions. From the suggestions, we are going to do two things, the 3rd one will be pending.
|
65952ea to
086c109
Compare
|
I can't reproduce the static check failure locally. Seems I will manually copy from CI and edit locally |
Yes. If we just do for information only - I think now failing the CI tests for users is too troublesome.I think maybe printing the information about coverage loss in output is ok- but not failing the PR.
Looks like bad merge + the need to regenerate images. It should be reproducible if you run |
|
I looked a bit closer ... And unfortunately I think there are few problems:
https://github.com/apache/airflow/actions/runs/8080612432/job/22077745306?pr=37152#step:5:7061
|
|
One more thing. I think I right now - the "upload-coverage" is not used by the test - at all from what I see. And the thing is - it was is implemented so that individual PRs are not sending coverage data to the server - because that's where the real "final" coverage is determined by combining all the coverage from all the tests for a given run are done. I am not sure if fhat's the case - looks like it's not handled the right way any more. @Taragolis - can you help with it? BTW. I think the flaky tests might be because the coverage data is collected in memory When the error code is fixed, you should re-run this test with |
Given the function I have as upload-coverage, I think it still works the right way? def upload_coverage(self) -> str:
if self.event_name == "push" and self.head_repo == "apache/airflow" and self.ref == "refs/heads/main":
return "true"
return "false"We only upload coverage when the tests were run on the main branch.
Yeah, the failure needs to be investigated. I will check again and add the label |
14cf57d to
0d08897
Compare
Right.. Stupid me. I missed that we have if in the |
|
I reran the tests twice; the previous failing impersonation tests have been corrected. I couldn't still reproduce the static failure locally and what is being requested to be updated on the CI doesn't make sense to me |
0d08897 to
245ba81
Compare
|
This is what you see when you unfold the |
This is an attempt to prevent a file with full test coverage from slipping below 100% coverage. Use multiprocessing for coverage Remove non-db test files from covered when running db tests Remove files that have full coverage from not fully covered files Add color to error message Update hashes Add back run-coverage for hatch regenerate images Move the reporting out of analyzing and make this non-fatal Report coverage before analysing Fix static checks
245ba81 to
48698c6
Compare
|
Yeah. That's what I was afraid - the impersonation test fail in ALL tests now :) |
But that is good - because it failed intermittently and if find the problem and fix it here, we will be quite certain it will be fixed :) |
Yes. That's the next thing I will be looking at. Thanks for checking into the tests, it didn't occur to me to even check! |
Ahhhh... as far as i remember this test is combination of the pain and workaround in addition we tests here against something which doesn't work well or to hard debug in Airflow:
|
|
Yeah. We might want to remove or simplify the test. BTW. There is one more thing that I noticed - all of the tests in this build take upwards of 25m (some even more than 30). Where Canary builds - those are using the "old way" of gettting the coverage are between 12m and 25m (SQLite tests is where you can see the biggest difference). Compare the current build with for example this one: https://github.com/apache/airflow/actions/runs/8103881626/job/22149538266 Possibly the reason is the same as impersonation - I suspect that gathering coverage information in the way we run it now take much more memory - also it could be that test output etc. are buffered (or smth like that). One way to check it is to add a "debug CI resources" label on the build and make a regular PR also with "debug CI resources" and "full tests needed" label (where you remove the check where coverage is only enabled for canary buils) Then you can have a "Regular PR" - running coverage and "New coverage PR" - running coverage and compare the two (including resource usage). That might give a clue on the differences. |
|
Also the "use public runner" and comparing regular PR with no coverage enabled with "with coverage" might show some surprises. I think adding coverage slows down the tests (in general - even in the old way) by 20% - 30%, but we need to run a few test runs to see that. That's one of the reasons we have the coverage tests disabled for regular PRs. We need a bit more datapoints for that - and see if it makes sense to enable coverage for absolutely all PRs, maybe we can find some way where we only selectively enable it (like when there are core changes maybe - depends what is really our ultimate goal here - because if we enable it and won't use it (but pay the cost) - that's likely not a good idea. |
Yeah the test is hacky because we have to change some settings into the runtime for allow us to tests some very old bug |
|
We need a two counters 🤣
|
Yes. Second comment in second answer here might be nice template https://stackoverflow.com/questions/184618/what-is-the-best-comment-in-source-code-you-have-ever-encountered |
|
BTW, in attempt to fix some flakey tests we discuss to run specific tests isolated from other tests, for avoid some unpredictable behavior and side effect which it may add to other tests, e.g. we have a tests which raises SIGSEGV 🤣 In theory there is
|
|
We already have |
|
Yeah, something like that. It would be nice to have run |
|
BTW we could mark temporary |
Tempting :) |
|
@ephraimbuddy I think it worthwhile to check it again after we move impersonations as quarantined |
Agree :) |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |

This is an attempt to prevent a file with full test coverage from slipping below 100% coverage.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.