Clear DB before individual module tests #28631

Taragolis · 2022-12-28T18:13:41Z

Some hacks which uses in PR

~~module scope with parsing "provider name" instead of package scope. pytest decide that package is airflow and run only once~~
Use separate clear tests helpers instead of airflow.utils.db.resetdb(). locally cause some errors with duplicate constraint.

eladkal · 2022-12-28T18:35:23Z

tests/providers/conftest.py

Should/can we generate this list automatically?
Hard coding has risk that we will forget to update the list when needed

We can check for presence of provider.yaml. Or better - rely on generated/provider_dependencies.json instead. the provider_dependencies.json is guaranteed to be updated by pre-commit and it is far easier to use than scanning folders, as well as it's intention is to remain stable when we refactor and move the providers.

Another comment here @Taragolis . Maybe we do not need that at all. Don't we want to do it for ALL packages that are "leaves" - not only for providers (including all airflow packages)? I think that would be a nice compromise between isolation and speed of execution

There is also one side-effect that we should take into account here. This is fine when we run the tests in CI, but the way it is implemented now, it will also run the auto-fixture even if you run a single unit test. This will dramatically slow-down any iterations you do I am afraid (and it does not really matter if we do it only for Providers or for whole airflow).

I think this can be solved by only doing this when we run:

for the first time ever (so some kind of check if db is initialized)

or always when we run in CI (CI env variable set to "true").

The 1) would be nice for first-time users - we badly miss it now, the user needs to know that they have to run the tests with --with-db-init flag.

I think we could . Just need to scan directory until found one of the folder: hooks, operators, sensors

I could check also request fixture tomorrow, I thought it should contain session information, such as how many tests collected and if it equal (or less, I don't know how it possible) 1 then do not cleanup anything.

And also remove yield in this fixture we do not need any teardown, at least yet

I could check also request fixture tomorrow, I thought it should contain session information, such as how many tests collected and if it equal (or less, I don't know how it possible) 1 then do not cleanup anything.

I think this is not really needed now - the overhead with using the test_utils classes is really low indeed. I believe we should enable it for all tests and compare execution time on CI and if it is < 5% overhead in total or so, we can leave it on for all modules.

For long term we also need to consider get rid of most of the stuff in test.test_utils and replace them by equivalent as fixture/context manager/markers but this better complete after migrate remaining tests to pytest and document them.

Because most of the new contributors just copy-paste some stuff from one test to another without any idea how it actually work.

For long term we also need to consider get rid of most of the stuff in test.test_utils and replace them by equivalent as fixture/context manager/markers but this better complete after migrate remaining tests to pytest and document them.

Because most of the new contributors just copy-paste some stuff from one test to another without any idea how it actually work.

Yep. Full agreement here.

tests/providers/conftest.py

uranusjr · 2022-12-29T05:10:59Z

tests/providers/conftest.py

Would it be a good idea to also clean after the provider tests? I feel like it’s generally a good principle for one thing to clean up one’s own stuffs. We could do

Clean when entering the providers directory.

Clean after each provider is finished.

I tihnk we've came to proposal that with such (low) overhead we have we can clean up database not before/after every provider but before (after?) any module. As you can see from the attached videos above - overall overhead for running single test is neglectible if we do (~0.2 s is almost not noticeable). It will be far less if we run the whole group of tests.

If we always clean before each module - this will work not only for providers, but also for regular airflow tests - with such a low overhead, we do not need to do cleanup because we have 100% guarantee, that the DB will be cleaned before runnin any test.

We still will need to see how much it will impact the overall test time, but I am cautiously optimistic it won't be a lot. And we can move it up to package as well if we find module is too much.

And this approach has really some interesting properties if we do such cleanup unconditionally before every module:

We can stop worrying about any DB side-effects outside of the modules. We have all but guarantee that any side-effects will not cross the module boundaries

Running all tests in a module is usually what people do - because it is easy. And whenever we see a suspicious side-effect, it will be super-easy to reproduce - just run the whole module - it will run identically in CI and locally because each module is isolated.

When a new user checks out and run tests, they will not have to think about initializing the database - similarly when the database has changed. So far we had --with-db-init switch to let airflow reset the unit test DB but it was pretty brittle and not many people even knew about its existence. We will be able to remove that now - because every time you run a test you have guarantee to get a fresh DB with the current structure.

Another con of cleaning after the test - this is rarely used, but potentially not cleaning the DB after the test gives the developer the luxury that we can easily inspect the DB after the test completes - so for running individual tests, I think leaving the DB intact after the test has some nice debugging property as well.

I've better keep --with-db-init time to time devs still need to clear DB (something changed in structure).

I have for this run configurations, but someone could find --with-db-init more useful rather then this

Yes. We should leave it in.

Taragolis · 2022-12-29T09:49:56Z

Ok, lets have a look.

I change it and make it as module scoped autouse fixture

Taragolis · 2022-12-29T10:16:35Z

Also add option which disable auto cleanup db. Of course it is not affect explicit clear parts

Taragolis · 2022-12-29T12:22:34Z

It is debugging time!

Taragolis · 2022-12-29T14:08:43Z

Fingers cross. Locally it run without any issues

potiuk · 2022-12-29T17:38:33Z

tests/conftest.py

potiuk · 2022-12-29T17:39:44Z

tests/conftest.py

Cool. This is indeed important, though many tests will not work anyway with xdist.

potiuk · 2022-12-29T17:40:28Z

tests/conftest.py

I like the escalation here and informative message.

potiuk · 2022-12-29T17:42:09Z

tests/integration/api/auth/backend/test_kerberos_auth.py

I guess this is accidental fix that was needed? Can we maybe separate it out to another PR (just for trackability)?

I move this usage into class fixture.

def _set_attrs(self, app_for_kerberos, dagbag_to_db):

I see. Right. Should be maybe change the name of the class fixture? to initialize or something?
I was under the impressuion (due to _set_attrs name) it was doing something different - and now it is not really setting the attributes.

Yeah. This is a good point.

BTW we have quite a few tests classes where we use method (function) scoped fixture. And most of them has different name, so might be also a good point to make them more or less consistent.

tests/conftest.py

potiuk

❤️ it this way. Few nits for possible separation of some loosely related changes into separate PRs.

potiuk · 2022-12-29T17:47:49Z

BTW. I do not see a noticeable impact on the times of execution of the tests ... Which is cool if this is really the case. Are we sure it works :) ?

Taragolis · 2023-01-04T10:29:31Z

@potiuk I removed all optional changes and kept only related to this PR: autofixture and changes in API tests + Kerberos Integration

Also found that if we call DAG.sync_to_db() it no has any affect to database, so I change it to same method in DagBag object, I do not know is it expected or it is some kind of bug.

Example of changes

--- a/tests/api/client/test_local_client.py
+++ b/tests/api/client/test_local_client.py
@@ -44,7 +44,7 @@
 class TestLocalClient:
     @classmethod
     def setup_class(cls):
-        DagBag(example_bash_operator.__file__).get_dag("example_bash_operator").sync_to_db()
+        DagBag(example_bash_operator.__file__, include_examples=False).sync_to_db()
 
     def setup_method(self):
         clear_db_pools()

Previously this part work because we do not clear this part of metadatabase in API tests and tests use initial database state.

ephraimbuddy · 2023-01-04T13:47:01Z

tests/api/common/test_mark_tasks.py

Why set include_examples=True? Won't it cost some more time?

Some API tests use example DAGs: example_subdag_operator, example_trigger_target_dag and others.
Before changes in this PR all example serialised DAGs stored into database so we did not have issues with it but after we start cleanup DB before each module we have.

I'd rather to say this is temporary solution as target we need to use DAGs from:

tests/dags

tests/dags_corrupted

tests/dags_with_system_exit

Yeah. Those changes are needed .

Also found that if we call DAG.sync_to_db() it no has any affect to database, so I change it to same method in DagBag object, I do not know is it expected or it is some kind of bug.

Not sure about it. But I do not see it to have any noticeable effect on time of running the tests, so let's keep it this way.

Yeah. Those changes are needed .

Echo from the past. I trace back to initial commit from 2017 🤣

potiuk · 2023-01-04T15:30:01Z

I am good with how it looks like now :)

Taragolis · 2023-01-09T13:36:56Z

Let's check with public runners

potiuk · 2023-01-09T18:58:02Z

Looks GOOOD. Shall we ?

Taragolis · 2023-01-09T19:38:24Z

Yeah. Let's merge.

Taragolis requested review from ephraimbuddy, potiuk and uranusjr December 28, 2022 18:13

boring-cyborg bot added the area:providers label Dec 28, 2022

Taragolis added the full tests needed We need to run full set of tests for this PR to merge label Dec 28, 2022

Taragolis force-pushed the clear-db-between-providers-run branch from e5b2da0 to 1192503 Compare December 28, 2022 18:26

eladkal reviewed Dec 28, 2022

View reviewed changes

potiuk reviewed Dec 28, 2022

View reviewed changes

tests/providers/conftest.py Outdated Show resolved Hide resolved

uranusjr reviewed Dec 29, 2022

View reviewed changes

Taragolis force-pushed the clear-db-between-providers-run branch from 751f663 to 41fcbc9 Compare December 29, 2022 09:47

Taragolis changed the title ~~Clear DB between each separate providers tests~~ Clear DB before individual module tests Dec 29, 2022

Taragolis force-pushed the clear-db-between-providers-run branch from 41fcbc9 to 15391a8 Compare December 29, 2022 10:15

Taragolis force-pushed the clear-db-between-providers-run branch from 15391a8 to 2b41427 Compare December 29, 2022 13:47

potiuk reviewed Dec 29, 2022

View reviewed changes

tests/conftest.py Outdated

Copy link

Member

potiuk Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

potiuk reviewed Dec 29, 2022

View reviewed changes

tests/conftest.py Outdated

Copy link

Member

potiuk Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. This is indeed important, though many tests will not work anyway with xdist.

potiuk reviewed Dec 29, 2022

View reviewed changes

tests/conftest.py Outdated

Copy link

Member

potiuk Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the escalation here and informative message.

potiuk reviewed Dec 29, 2022

View reviewed changes

tests/conftest.py Outdated Show resolved Hide resolved

potiuk approved these changes Dec 29, 2022

View reviewed changes

Taragolis force-pushed the clear-db-between-providers-run branch from 2b41427 to 692eb92 Compare January 4, 2023 10:21

Taragolis force-pushed the clear-db-between-providers-run branch from 692eb92 to 9396f07 Compare January 4, 2023 12:48

ephraimbuddy reviewed Jan 4, 2023

View reviewed changes

Taragolis force-pushed the clear-db-between-providers-run branch from 9396f07 to 180f144 Compare January 7, 2023 09:48

Taragolis force-pushed the clear-db-between-providers-run branch from 180f144 to 3f33c03 Compare January 7, 2023 15:17

Taragolis added area:CI Airflow's tests and continious integration use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) and removed area:providers labels Jan 9, 2023

Clear DB before individual module tests

d240ade

Taragolis force-pushed the clear-db-between-providers-run branch from 3f33c03 to d240ade Compare January 9, 2023 13:36

Taragolis merged commit ff48ba1 into apache:main Jan 9, 2023

Taragolis deleted the clear-db-between-providers-run branch January 9, 2023 19:38

amoghrajesh mentioned this pull request Jul 6, 2023

Migrating helm chart tests to helm_tests to prevent test DB init #32394

Merged

Clear DB before individual module tests #28631

Clear DB before individual module tests #28631

Uh oh!

Conversation

Taragolis commented Dec 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

potiuk Dec 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

potiuk Dec 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Taragolis Dec 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

potiuk Dec 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

uranusjr Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

potiuk Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Taragolis Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Taragolis commented Dec 29, 2022

Uh oh!

Taragolis commented Dec 29, 2022

Uh oh!

Taragolis commented Dec 29, 2022

Uh oh!

Taragolis commented Dec 29, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

potiuk left a comment

Choose a reason for hiding this comment

Uh oh!

Taragolis commented Dec 28, 2022 •

edited

Loading

potiuk Dec 28, 2022 •

edited

Loading

potiuk Dec 28, 2022 •

edited

Loading

Taragolis Dec 28, 2022 •

edited

Loading

potiuk Dec 28, 2022 •

edited

Loading

uranusjr Dec 29, 2022 •

edited

Loading

potiuk Dec 29, 2022 •

edited

Loading

Taragolis Dec 29, 2022 •

edited

Loading