Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Apr 24, 2023

For PRs that are just changing selected providers, there is no need to install and verify all providers, just those affected should be installed (and full verification should only be done when all providers are being built - so generally speaking, when Core PRs are run or when canary builds are run.

This PR distinguishes those two cases:

  • installing and verification when all providers are selected
  • installing only affected providers when only subset of them are affected

Also a bug was found where core file changes did not trigger provider docs build during implementation. This has been fixed


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Apr 24, 2023

Yet another improvement of CI speed - this time with testing provider packages. For Provider-only changes, a lot of contributors should be able to save a lot of build time for just building and installing the affected provider packages, rather than all of them

@potiuk potiuk force-pushed the speed-up-provider-verification-in-ci branch from fd1e3df to 8874193 Compare April 24, 2023 14:30
@potiuk
Copy link
Member Author

potiuk commented Apr 24, 2023

I also found one case where core-changes did not trigger docs build for Providers, which should not be the case, when we change core files, potentilly all providers are affected (for tests, docs building and package preparation).

@potiuk potiuk force-pushed the speed-up-provider-verification-in-ci branch 4 times, most recently from de933a2 to 8e9fcb3 Compare April 24, 2023 22:49
@potiuk potiuk force-pushed the speed-up-provider-verification-in-ci branch from 8e9fcb3 to cf077a8 Compare April 24, 2023 22:57
@potiuk potiuk marked this pull request as draft April 24, 2023 22:59
Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should test all the providers when we change something in airflow/models like the base sensor or operator. I don't see a test for that. WDYT?

@potiuk
Copy link
Member Author

potiuk commented Apr 24, 2023

I think we should test all the providers when we change something in airflow/models like the base sensor or operator. I don't see a test for that. WDYT?

It's covered by the test below. This is actually the issue I mentioned with:

I also found one case where core-changes did not trigger docs build for Providers, which should not be the case, when we change core files, potentilly all providers are affected (for tests, docs building and package preparation).

        pytest.param(
            (
                "airflow/file.py",
                "tests/providers/google/file.py",
            ),
            {
                "affected-providers": "",
                "all-python-versions": "['3.7']",
                "all-python-versions-list-as-string": "3.7",
                "image-build": "true",
                "needs-helm-tests": "false",
                "run-tests": "true",
                "docs-build": "true",
                "docs-filter": "",
                "run-kubernetes-tests": "false",
                "upgrade-to-newer-dependencies": "false",
                "skip-provider-tests": "false",
                "parallel-test-types": "Core Providers[-amazon,google] Other Providers[amazon] WWW "
                "API Always CLI Providers[google]",
            },
            id="All tests should run if core file changed",
        ),

We already had this as a case where we decide that we should run all provider tests where "essential" airflow core chnages. This means if there is a change that is "airflow" sources but:

  • not an API
  • not a WWW
  • not a CLI
  • not providers
  • not a chart/kubernetes_tests files

We decide to:

  • run all Providers tests: "run-tests"= "true", parallel-tests-types has "Providers" (split into three groups now)
  • build all Providers docs (fixed in this PR) -> docs_filter = "", docs-build="true"
  • (added in this PR) build and install all providers -> afffected_providers = "" (means build all providers to check them)

Yeah, likely we could still refactor it (I think it would have been better to distinguish the values of those two for case where we skip them alltogether but for now it is what it is - for now guarded by the test cases).

@potiuk
Copy link
Member Author

potiuk commented Apr 24, 2023

Yeah, likely we could still refactor it (I think it would have been better to distinguish the values of those two for case where we skip them alltogether but for now it is what it is - for now guarded by the test cases).

After the comment -yeah I think I might want to refactor that one .. I also see how convoluted it got and I have some idea how to make it "cleaner" a bit.

@potiuk potiuk force-pushed the speed-up-provider-verification-in-ci branch from cf077a8 to 7a77ecf Compare April 25, 2023 06:28
@potiuk
Copy link
Member Author

potiuk commented Apr 25, 2023

All right. I think this should be much better now:

  • I made names of the outputs more consistent ("list-as-string" where we use string to pass lists - and explained why)
  • I added optionality for those outputs where empty list could previously mean both None and All.

@potiuk potiuk force-pushed the speed-up-provider-verification-in-ci branch from 7a77ecf to fdc2b5b Compare April 25, 2023 07:24
@potiuk potiuk marked this pull request as ready for review April 25, 2023 07:25
@potiuk potiuk force-pushed the speed-up-provider-verification-in-ci branch 3 times, most recently from 9d1fa63 to a29bfe1 Compare April 25, 2023 15:05
For PRs that are just changing selected providers, there is no
need to install and verify all providers, just those affected
should be installed (and full verification should only be done
when all providers are being built - so generally speaking, when
Core PRs are run or when canary builds are run.

This PR distinguishes those two cases:

* installing and verification when all providers are selected
* installing only affected providers when only subset of them are
  affected

Also a bug was found where core file changes did not trigger provider
docs build during implementation. This has been fixed
@potiuk potiuk force-pushed the speed-up-provider-verification-in-ci branch from a29bfe1 to 8f468d7 Compare April 25, 2023 15:09
@potiuk
Copy link
Member Author

potiuk commented Apr 25, 2023

Running last tests, and I think in the current version of selective checks and the tests show the intentions much better and the code is cleaner (extracted the important bits):

ALL_DOCS_SELECTED_FOR_BUILD = ""
ALL_PROVIDERS_AFFECTED = ""

        pytest.param(
            ("INTHEWILD.md",),
            {
                "affected-providers-list-as-string": None,
                "run-tests": "false",
                "docs-build": "false",
                "docs-filter-list-as-string": None,
                "skip-provider-tests": "true",
                "parallel-test-types-list-as-string": None,
            },
            id="Nothing should run if only non-important files changed",
        ),
        pytest.param(
            ("tests/system/any_file.py",),
            {
                "affected-providers-list-as-string": None,
                "run-tests": "true",
                "docs-build": "true",
                "docs-filter-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
                "skip-provider-tests": "true",
                "parallel-test-types-list-as-string": "Always",
            },
            id="Only Always and docs build should run if only system tests changed",
        ),
        pytest.param(
            (
                "airflow/cli/test.py",
                "chart/aaaa.txt",
                "tests/providers/google/file.py",
            ),
            {
                "affected-providers-list-as-string": "amazon apache.beam apache.cassandra "
                "cncf.kubernetes common.sql "
                "facebook google hashicorp microsoft.azure microsoft.mssql mysql "
                "oracle postgres presto salesforce sftp ssh trino",
                "run-tests": "true",
                "docs-build": "true",
                "docs-filter-list-as-string": "--package-filter apache-airflow --package-filter helm-chart "
                "--package-filter apache-airflow-providers-amazon "
                "--package-filter apache-airflow-providers-apache-beam "
                "--package-filter apache-airflow-providers-apache-cassandra "
                "--package-filter apache-airflow-providers-cncf-kubernetes "
                "--package-filter apache-airflow-providers-common-sql "
                "--package-filter apache-airflow-providers-facebook "
                "--package-filter apache-airflow-providers-google "
                "--package-filter apache-airflow-providers-hashicorp "
                "--package-filter apache-airflow-providers-microsoft-azure "
                "--package-filter apache-airflow-providers-microsoft-mssql "
                "--package-filter apache-airflow-providers-mysql "
                "--package-filter apache-airflow-providers-oracle "
                "--package-filter apache-airflow-providers-postgres "
                "--package-filter apache-airflow-providers-presto "
                "--package-filter apache-airflow-providers-salesforce "
                "--package-filter apache-airflow-providers-sftp "
                "--package-filter apache-airflow-providers-ssh "
                "--package-filter apache-airflow-providers-trino",
                "skip-provider-tests": "false",
                "parallel-test-types-list-as-string": "Providers[amazon] Always CLI "
                "Providers[apache.beam,apache.cassandra,cncf.kubernetes,common.sql,facebook,"
                "hashicorp,microsoft.azure,microsoft.mssql,mysql,oracle,postgres,presto,"
                "salesforce,sftp,ssh,trino] Providers[google]",
            },
            id="CLI tests and Google-related provider tests should run if cli/chart files changed",
        ),
        pytest.param(
            (
                "airflow/cli/file.py",
                "airflow/www/file.py",
                "airflow/api/file.py",
            ),
            {
                "affected-providers-list-as-string": None,
                "run-tests": "true",
                "docs-build": "true",
                "docs-filter-list-as-string": "--package-filter apache-airflow",
                "skip-provider-tests": "true",
                "parallel-test-types-list-as-string": "WWW API Always CLI",
            },
            id="No providers tests should run if only CLI/API/WWW file changed",
        ),
        pytest.param(
            ("airflow/models/test.py",),
            {
                "affected-providers-list-as-string": ALL_PROVIDERS_AFFECTED,
                "run-tests": "true",
                "docs-build": "true",
                "docs-filter-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
                "run-kubernetes-tests": "false",
                "skip-provider-tests": "false",
                "parallel-test-types-list-as-string": "Core Providers[-amazon,google] Other "
                "Providers[amazon] WWW "
                "API Always CLI Providers[google]",
            },
            id="Tests for all providers should run if model file changed",
        ),
        pytest.param(
            ("airflow/file.py",),
            {
                "affected-providers-list-as-string": ALL_PROVIDERS_AFFECTED,
                "run-tests": "true",
                "docs-build": "true",
                "docs-filter-list-as-string": ALL_DOCS_SELECTED_FOR_BUILD,
                "skip-provider-tests": "false",
                "parallel-test-types-list-as-string": "Core Providers[-amazon,google] Other "
                "Providers[amazon] WWW "
                "API Always CLI Providers[google]",
            },
            id="Tests for all providers should run if any other than API/WWW/CLI file changed.",
        ),

@potiuk
Copy link
Member Author

potiuk commented Apr 25, 2023

cc: @hussein-awala -> I added the "models" specifically in tests - but it's basically the same as "any other" file in airflow (other than www/cli/api which we know are not involved with providers). I did not want to specifically limit models only because there are a number of files providers might depend on (exceptions, decorators, hooks, listeners, you name it) . Once we clarify the "actual" API that Providers can use, we could potentially enumerate those, but for now I assume that "any" change in those packages except API/CLI/WWW might impact providers directly.

It's now explicitly stated in the tests.

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, LGTM

@potiuk potiuk merged commit 3af22f5 into apache:main Apr 25, 2023
@potiuk potiuk deleted the speed-up-provider-verification-in-ci branch April 25, 2023 19:02
@eladkal eladkal added this to the Airflow 2.6.2 milestone Jun 9, 2023
@eladkal eladkal added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jun 9, 2023
potiuk added a commit that referenced this pull request Jun 9, 2023
For PRs that are just changing selected providers, there is no
need to install and verify all providers, just those affected
should be installed (and full verification should only be done
when all providers are being built - so generally speaking, when
Core PRs are run or when canary builds are run.

This PR distinguishes those two cases:

* installing and verification when all providers are selected
* installing only affected providers when only subset of them are
  affected

Also a bug was found where core file changes did not trigger provider
docs build during implementation. This has been fixed

(cherry picked from commit 3af22f5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants