Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Jun 18, 2025

The tests in airflow-ctl failed occasionally when run with xdist, also the definition of the airflow-ctl dependencies in pyproject.toml did not properly reflect dependencies needed to run the tests - they silently assumed that tests are always run in a complete workspace, where all packages were installed. That is pretty unnecessary and it was possible to improve the devel-common to handle minimum set of requirements for airflow-ctl so that it does not have to install unnecessary depedencies.

Since airflow-ctl does not now require airflow nor sqlalchemy for installation and tests we can use the power of uv to automatically install and run the tests without the need of CI image - airflow-ctl is supposed to be small and standalone package so set of dependencies it has should make it possible to install it without having all the 700+ dependencies in the CI nor having system dependencies installed in Debian image.

Another problem was that when the tests were run with MacOS keyring, the login test was hanging waiting for user entry.

This PR fixes those problems:

  • devel-common distribution does not have to have airflow nor sqlalchemy as dependency - this way airflow-ctl distribution can run tests without all packages installed

  • instead of platformdirs user_config_dir, we use direct retrieval of configuration folder following airflow pattern (AIRLFOW_HOME or ~/airflow). This allows to easily patch airflow home to use temporary directory for each test - which solves the xdist failures when tests running in parallel were overriding each-others config folder

  • patching all paths that the test can ask for password makes it not hang when keyring is installed

  • breeze airflow-ctl tests perform uv sync in airflow-ctl directory in order to make sure that no other dependencies are used during CI tests.

  • test_command.py file is .gitignored and deleted after the test are run

  • breeze testing airflow-ctl now does not require CI images to be run, we are using uv run to autonaticaly install the venv in the right python version, which simplifies and speeds up running.

  • for airflow-ctl we use --use-local-hatch to build the packages locally

After this change running tests for airflow-ctl can be done very easily with:

  • breeze testing airfllw-ctl --python N.N

or

  • cd airflow-ctl; uv run --python N.N pytest

They should be equivalent.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Jun 18, 2025

cc: @bugraoz93 -> this is should vastly speed up and simplify the "airflow-ctl" test suite and also locally running it - harnessing the power of uv sync

Running all airflow-ctl tests will now be possible with just:

cd airflow-ctl
uv run pytest

@potiuk potiuk requested a review from jscheffl June 18, 2025 22:27
@potiuk
Copy link
Member Author

potiuk commented Jun 18, 2025

cc: @kaxil @ashb @amoghrajesh

The airflow-ctl is the first of the distributions that we can run separately and simplify all the testing enviroment for it. previously it made no sense to do it for airlfow/providers etc. and we needed to use CI image for that - because that was the only reproducible environment that we could run tests in.

And this only shows how important is to split airflow into smaller pieces- including splitting airflow core. Currently we simply cannot separate the tests of webserver, dag processor, triggerer etc. because all that is so heavily entangled, we still have airflow core depending on some provider tests etc.

But if we do think seriously of our dependencies, and split them to smaller sets, this will be entirely possible to do very similar change as this one with airflow-ctl where we can saperate tests for each component to inndependent, workspace-managed packages and do something very similar (and only use CI image for provider tests and eventually maybe even get rid of it for most of the providers that will have "non-problematic" dependencies.

This is one of the reasons why I am so keen on splitting airflow and sharding the dependencies and using separate distributions and workspace to bring it all together. This is something @ashb always complained about that we have far too complex environment - and yes, we do, because we have everything bundled together. When we split it - we will finally be able to achieve the "dream" that many of our tests will be runnable in such a super simple way as "airflow-ctl" will be after this PR.

@potiuk potiuk force-pushed the improve-airflow-ctl-tests branch from fb282ed to dda4496 Compare June 18, 2025 22:36
Copy link
Contributor

@bugraoz93 bugraoz93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing improvement Jarek, thanks a lot! Isolating and decoupling always feels good, and airflow-ctl is the best candidate to take these steps.
We still need Breeze to be installed in local env containers in the CI. We can tweak the CI but we would hit the hard unique workflow limit again or end up bespoke if statements. We can call uv run now even without Breeze in the CI, which can also eliminate installing Breeze into the local env.

@potiuk
Copy link
Member Author

potiuk commented Jun 19, 2025

We still need Breeze to be installed in local env containers in the CI. We can tweak the CI but we would hit the hard unique workflow limit again or end up bespoke if statements. We can call uv run now even without Breeze in the CI, which can also eliminate installing Breeze into the local env.

Actually we still can use breeze - but breeze does not really need the CI image to run (once I make the PR green) - which has the nice feature of providing a unified simple interface to run tests (breeze testing airflow-ctl --python version) - with auto-complete --verbose, --dry-run and other stuff. I was thinking whethere we want to remove breeze call altogether - but likely not. This serves different purpose - with breeze we have an easy way to replicate exactly what CI is doing - easily (so for example for new contributors it's super easy to run all tests). With uv run manually and running pytest manually - the power-users might do much more because they can run pytest directly with all the bells and whistles and without using breeze.

I think I would like to stick with the pattern that all CI workflows are triggered with breeze, while - for the more isolated cases - allowing the developers to run as many tests as possible using direct uv run.

WDYT @bugraoz93 and others ?

@potiuk potiuk force-pushed the improve-airflow-ctl-tests branch from dda4496 to d68fd09 Compare June 19, 2025 07:15
@potiuk
Copy link
Member Author

potiuk commented Jun 19, 2025

Also I think having all kinds of tests you can run triggered by breeze gives a nice feature of discoverability what kind of tests we have and how to run them. Having it all grouped together in "testing" (except k8s which is separate thing altogether) is really nice for newcomers as they can immediately (with auto-complete) discover what kind of tests there can be run and how to run them - without even having to read contributing documentation.

@potiuk
Copy link
Member Author

potiuk commented Jun 19, 2025

See the last change - no need for extra workflows - we have just conditional steps in the existing workflow, where we control whether we use local venv or CI image to run tests (with use-local-venv input parameter).

@potiuk potiuk force-pushed the improve-airflow-ctl-tests branch 5 times, most recently from 1db8d90 to 1865c20 Compare June 19, 2025 09:55
@potiuk
Copy link
Member Author

potiuk commented Jun 19, 2025

Yep - it should get green now, with running things locally - without CI image even with breeze testing airflow-ctl cmd.

@potiuk
Copy link
Member Author

potiuk commented Jun 19, 2025

And airflow-ctl tests will start now without even waiting for CI image being built. https://github.com/apache/airflow/actions/runs/15755034518/job/44408491632?pr=51908

@potiuk
Copy link
Member Author

potiuk commented Jun 19, 2025

Error unrelated - PR to fix is #51931

The tests in airflow-ctl failed occasionally when run with xdist, also
the definition of the airflow-ctl dependencies in pyproject.toml did
not properly reflect dependencies needed to run the tests - they
silently assumed that tests are always run in a complete workspace,
where all packages were installed. That is pretty unnecessary and
it was possible to improve the devel-common to handle minimum set
of requirements for airflow-ctl so that it does not have to install
unnecessary depedencies.

Since `airflow-ctl` does not now require airflow nor sqlalchemy for
installation and tests we can use the power of `uv` to automatically
install and run the tests without the need of CI image - airflow-ctl
is supposed to be small and standalone package so set of dependencies
it has should make it possible to install it without having all the
700+ dependencies in the CI nor having system dependencies installed in
Debian image.

Another problem was that when the tests were run with MacOS keyring, the
login test was hanging waiting for user entry.

This PR fixes those problems:

* devel-common distribution does not have to have airflow nor sqlalchemy
  as dependency - this way airflow-ctl distribution can run tests
  without all packages installed

* instead of platformdirs user_config_dir, we use direct retrieval
  of configuration folder following airflow pattern (AIRLFOW_HOME
  or ~/airflow). This allows to easily patch airflow home to use
  temporary directory for each test - which solves the xdist
  failures when tests running in parallel were overriding each-others
  config folder

* patching all paths that the test can ask for password makes it not
  hang when keyring is installed

* breeze airflow-ctl tests perform `uv sync` in airflow-ctl directory
  in order to make sure that no other dependencies are used during
  CI tests.

* test_command.py file is .gitignored and deleted after the test are
  run

* breeze testing airflow-ctl now does not require CI images to be
  run, we are using `uv run` to autonaticaly install the venv
  in the right python version, which simplifies and speeds up running.

* for airflow-ctl we use --use-local-hatch to build the packages
  locally

After this change running tests for airflow-ctl can be done very
easily with:

* breeze testing airfllw-ctl --python N.N

or

* cd airflow-ctl; uv run --python N.N pytest

They should be equivalent.
@potiuk potiuk force-pushed the improve-airflow-ctl-tests branch from 1865c20 to 41c2664 Compare June 19, 2025 11:33
@bugraoz93
Copy link
Contributor

I think I would like to stick with the pattern that all CI workflows are triggered with Breeze, while - for the more isolated cases - allowing the developers to run as many tests as possible using direct uv run.

It makes sense! I think using Breeze for all CI related things makes it easy to maintain and always shows where to check without any ambiguity. I am on using the Breeze pattern. I tried to be pointing out and maybe considering moving uv for a couple of actions, but no strong stance on there

@potiuk potiuk merged commit 24abfa7 into apache:main Jun 19, 2025
98 checks passed
@potiuk potiuk deleted the improve-airflow-ctl-tests branch June 19, 2025 15:29
@potiuk
Copy link
Member Author

potiuk commented Jun 19, 2025

I tried to be pointing out and maybe considering moving uv for a couple of actions, but no strong stance on there

I am all for using uv for "standalone" scripts. Like the check_transaction_completeness.py -> with inline dependencies. I consider those "standalone" scripts as "mini" projects and if they do not depend on any other code of ours, I prefer to kee[ them standalone and use uv run script.py.

But when it comes to more "complex" pieces which require quite a bit more libraries, modules etc. to import - fully fledged breeze project with pyproject.toml, set of utilities that can be reused, automated command line help output etc. is a much better solution IMHO. Especially that now everyone uses it already, have it installed and likely has auto-complete - and it's rather easy to get a breeze command from CI run and execute it locally.

@bugraoz93
Copy link
Contributor

I tried to be pointing out and maybe considering moving uv for a couple of actions, but no strong stance on there

I am all for using uv for "standalone" scripts. Like the check_transaction_completeness.py -> with inline dependencies. I consider those "standalone" scripts as "mini" projects and if they do not depend on any other code of ours, I prefer to kee[ them standalone and use uv run script.py.

But when it comes to more "complex" pieces which require quite a bit more libraries, modules etc. to import - fully fledged breeze project with pyproject.toml, set of utilities that can be reused, automated command line help output etc. is a much better solution IMHO. Especially that now everyone uses it already, have it installed and likely has auto-complete - and it's rather easy to get a breeze command from CI run and execute it locally.

When features come to breeze like doc build or else, I have never used uv again :)

@potiuk
Copy link
Member Author

potiuk commented Jun 21, 2025

When features come to breeze like doc build or else, I have never used uv again :)

Actually i found that I am using both - as I need. For example building docs is now quite easy with uv - especialy that the docs are now "isolated per project" - and I usually keep history of commands in the terminal so ....

cd providers/amazon
uv run --group docs build-docs --autobuild

Will nicely run autobuild server with documentation built and auto-reload when I change stuff.

And the doc build command is identical for all our distributions - you just need to be in the distribution's folder - so I don't have to remember it just uv run --group<UP ARROW> will bring the command from the history. But of course it's not as discoverable as breeze build-docs <TAB> ... (And breeze has no --autobuild flag - maybe it should).

I think it's good to have both ways.

@bugraoz93
Copy link
Contributor

When features come to breeze like doc build or else, I have never used uv again :)

Actually i found that I am using both - as I need. For example building docs is now quite easy with uv - especialy that the docs are now "isolated per project" - and I usually keep history of commands in the terminal so ....

cd providers/amazon
uv run --group docs build-docs --autobuild

Will nicely run autobuild server with documentation built and auto-reload when I change stuff.

And the doc build command is identical for all our distributions - you just need to be in the distribution's folder - so I don't have to remember it just uv run --group<UP ARROW> will bring the command from the history. But of course it's not as discoverable as breeze build-docs <TAB> ... (And breeze has no --autobuild flag - maybe it should).

I think it's good to have both ways.

It is definitely to have them both. Good points, that's true. Yeah, maybe it should

RoyLee1224 pushed a commit to RoyLee1224/airflow that referenced this pull request Jun 21, 2025
…ache#51908)

The tests in airflow-ctl failed occasionally when run with xdist, also
the definition of the airflow-ctl dependencies in pyproject.toml did
not properly reflect dependencies needed to run the tests - they
silently assumed that tests are always run in a complete workspace,
where all packages were installed. That is pretty unnecessary and
it was possible to improve the devel-common to handle minimum set
of requirements for airflow-ctl so that it does not have to install
unnecessary depedencies.

Since `airflow-ctl` does not now require airflow nor sqlalchemy for
installation and tests we can use the power of `uv` to automatically
install and run the tests without the need of CI image - airflow-ctl
is supposed to be small and standalone package so set of dependencies
it has should make it possible to install it without having all the
700+ dependencies in the CI nor having system dependencies installed in
Debian image.

Another problem was that when the tests were run with MacOS keyring, the
login test was hanging waiting for user entry.

This PR fixes those problems:

* devel-common distribution does not have to have airflow nor sqlalchemy
  as dependency - this way airflow-ctl distribution can run tests
  without all packages installed

* instead of platformdirs user_config_dir, we use direct retrieval
  of configuration folder following airflow pattern (AIRLFOW_HOME
  or ~/airflow). This allows to easily patch airflow home to use
  temporary directory for each test - which solves the xdist
  failures when tests running in parallel were overriding each-others
  config folder

* patching all paths that the test can ask for password makes it not
  hang when keyring is installed

* breeze airflow-ctl tests perform `uv sync` in airflow-ctl directory
  in order to make sure that no other dependencies are used during
  CI tests.

* test_command.py file is .gitignored and deleted after the test are
  run

* breeze testing airflow-ctl now does not require CI images to be
  run, we are using `uv run` to autonaticaly install the venv
  in the right python version, which simplifies and speeds up running.

* for airflow-ctl we use --use-local-hatch to build the packages
  locally

After this change running tests for airflow-ctl can be done very
easily with:

* breeze testing airfllw-ctl --python N.N

or

* cd airflow-ctl; uv run --python N.N pytest

They should be equivalent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants