From 53ba5cf4011c20d51e14bf1ca791331fe53b3628 Mon Sep 17 00:00:00 2001 From: Jarek Potiuk Date: Tue, 18 Apr 2023 13:22:57 +0200 Subject: [PATCH] Optimize parallel test execution for unit tests We are runnig the tests in parallel test types in order to speed up their execution. Howver some test types and subsets of tests are taking far longer to execute than other test types. The longest tests to run are Providers and WWW tests, and the longest tests from Providers are by far Amazon tests, then Google. "All Other" Provider tests take about the same time as Amazon tests - also after splitting the provider tests, Core tests take the longest time. When we are running tests in parallel on multiple CPUs, often the longest running tests remain runing on their own while the other CPUS are not busy. We could run separate tests type per provider, but overhead of starting the database and collecting and initializing tests for them is too big for it to achieve speedups - especially for Public runners, having 80 separate databases with 80 subsequent container runs is slower than running all Provider tests together. However we can split the Provider tests into smaller number of chunks and prioritize running the long chunks first. This should improve the effect of parellelisation and improve utilization of our multi-CPU machines. This PR aims to do that: * Split Provider tests (if amazon or google are part of the provider tests) into amazon, google, all-other chunks * Move sorting of the test types to selective_check, to sort the test types according to expected longest running time (the longest tests to run are added first) This should improve the CPU utilization of our multi-CPU runners and make the tests involving complete Provider set (or even sets containing amazon, google and few other providers) execute quite a few minutes faster on average. We could also get rid of some sequential processing for the Public PRs because each test type we will run will be less demanding overall. We used to get a lot of 137 exit codes (memory errors) but with splitting out Providers, the risk of exhausting resources be two test types running in paralel are low. --- BREEZE.rst | 9 ++ Dockerfile.ci | 13 +++ TESTING.rst | 8 +- .../commands/testing_commands.py | 59 +--------- .../commands/testing_commands_config.py | 1 - .../src/airflow_breeze/params/shell_params.py | 11 +- .../src/airflow_breeze/utils/run_tests.py | 6 ++ .../airflow_breeze/utils/selective_checks.py | 50 ++++++++- dev/breeze/tests/test_selective_checks.py | 101 +++++++++++------- images/breeze/output-commands-hash.txt | 4 +- images/breeze/output_testing_tests.svg | 76 +++++++------ scripts/docker/entrypoint_ci.sh | 13 +++ 12 files changed, 213 insertions(+), 138 deletions(-) diff --git a/BREEZE.rst b/BREEZE.rst index dc39062d8988c..647f672451cd3 100644 --- a/BREEZE.rst +++ b/BREEZE.rst @@ -788,6 +788,15 @@ For example this will only run provider tests for airbyte and http providers: breeze testing tests --test-type "Providers[airbyte,http]" +You can also exclude tests for some providers from being run when whole "Providers" test type is run. + +For example this will run tests for all providers except amazon and google provider tests: + +.. code-block:: bash + + breeze testing tests --test-type "Providers[-amazon,google]" + + You can also run parallel tests with ``--run-in-parallel`` flag - by default it will run all tests types in parallel, but you can specify the test type that you want to run with space separated list of test types passed to ``--parallel-test-types`` flag. diff --git a/Dockerfile.ci b/Dockerfile.ci index c3613ccc269a4..b86a12dfa7ffe 100644 --- a/Dockerfile.ci +++ b/Dockerfile.ci @@ -1049,6 +1049,19 @@ else ${TEST_TYPE} == "Postgres" || ${TEST_TYPE} == "MySQL" || \ ${TEST_TYPE} == "Long" ]]; then SELECTED_TESTS=("${ALL_TESTS[@]}") + elif [[ ${TEST_TYPE} =~ Providers\[\-(.*)\] ]]; then + # When providers start with `-` it means that we should run all provider tests except those + SELECTED_TESTS=("${PROVIDERS_TESTS[@]}") + for provider in ${BASH_REMATCH[1]//,/ } + do + providers_dir="tests/providers/${provider//./\/}" + if [[ -d ${providers_dir} ]]; then + echo "${COLOR_BLUE}Ignoring ${providers_dir} as it has been deselected.${COLOR_RESET}" + EXTRA_PYTEST_ARGS+=("--ignore=tests/providers/${provider//./\/}") + else + echo "${COLOR_YELLOW}Skipping ${providers_dir} as the directory does not exist.${COLOR_RESET}" + fi + done elif [[ ${TEST_TYPE} =~ Providers\[(.*)\] ]]; then SELECTED_TESTS=() for provider in ${BASH_REMATCH[1]//,/ } diff --git a/TESTING.rst b/TESTING.rst index a89b2d287c959..e137d1486c452 100644 --- a/TESTING.rst +++ b/TESTING.rst @@ -297,12 +297,18 @@ In case of Providers tests, you can run tests for all providers breeze testing tests --test-type Providers -You can also limit the set of providers you would like to run tests of +You can limit the set of providers you would like to run tests of .. code-block:: bash breeze testing tests --test-type "Providers[airbyte,http]" +You can also run all providers but exclude the providers you would like to skip + +.. code-block:: bash + + breeze testing tests --test-type "Providers[-amazon,google]" + Running full Airflow unit test suite in parallel ------------------------------------------------ diff --git a/dev/breeze/src/airflow_breeze/commands/testing_commands.py b/dev/breeze/src/airflow_breeze/commands/testing_commands.py index 752764bd9bb41..1e569460df6ca 100644 --- a/dev/breeze/src/airflow_breeze/commands/testing_commands.py +++ b/dev/breeze/src/airflow_breeze/commands/testing_commands.py @@ -17,7 +17,6 @@ from __future__ import annotations import os -import re import sys from datetime import datetime @@ -65,12 +64,11 @@ from airflow_breeze.utils.parallel import ( GenericRegexpProgressMatcher, SummarizeAfter, - bytes2human, check_async_run_results, run_with_pool, ) from airflow_breeze.utils.path_utils import FILES_DIR, cleanup_python_generated_files -from airflow_breeze.utils.run_tests import run_docker_compose_tests +from airflow_breeze.utils.run_tests import file_name_from_test_type, run_docker_compose_tests from airflow_breeze.utils.run_utils import get_filesystem_type, run_command LOW_MEMORY_CONDITION = 8 * 1024 * 1024 * 1024 @@ -143,7 +141,7 @@ def _run_test( "[error]Only 'Providers' test type can specify actual tests with \\[\\][/]" ) sys.exit(1) - project_name = _file_name_from_test_type(exec_shell_params.test_type) + project_name = file_name_from_test_type(exec_shell_params.test_type) down_cmd = [ *DOCKER_COMPOSE_COMMAND, "--project-name", @@ -209,11 +207,6 @@ def _run_test( return result.returncode, f"Test: {exec_shell_params.test_type}" -def _file_name_from_test_type(test_type: str): - test_type_no_brackets = test_type.lower().replace("[", "_").replace("]", "") - return re.sub("[,\.]", "_", test_type_no_brackets)[:30] - - def _run_tests_in_pool( tests_to_run: list[str], parallelism: int, @@ -268,43 +261,12 @@ def run_tests_in_parallel( parallel_test_types_list: list[str], extra_pytest_args: tuple, db_reset: bool, - full_tests_needed: bool, test_timeout: int, include_success_outputs: bool, debug_resources: bool, parallelism: int, skip_cleanup: bool, ) -> None: - import psutil - - memory_available = psutil.virtual_memory() - if memory_available.available < LOW_MEMORY_CONDITION and exec_shell_params.backend in ["mssql", "mysql"]: - # Run heavy tests sequentially - heavy_test_types_to_run = {"Core", "Providers"} & set(parallel_test_types_list) - if heavy_test_types_to_run: - # some of those are requested - get_console().print( - f"[warning]Running {heavy_test_types_to_run} tests sequentially" - f"for {exec_shell_params.backend}" - f" backend due to low memory available: {bytes2human(memory_available.available)}" - ) - tests_to_run_sequentially = [] - for heavy_test_type in heavy_test_types_to_run: - for test_type in parallel_test_types_list: - if test_type.startswith(heavy_test_type): - parallel_test_types_list.remove(test_type) - tests_to_run_sequentially.append(test_type) - _run_tests_in_pool( - tests_to_run=tests_to_run_sequentially, - parallelism=1, - exec_shell_params=exec_shell_params, - extra_pytest_args=extra_pytest_args, - test_timeout=test_timeout, - db_reset=db_reset, - include_success_outputs=include_success_outputs, - debug_resources=debug_resources, - skip_cleanup=skip_cleanup, - ) _run_tests_in_pool( tests_to_run=parallel_test_types_list, parallelism=parallelism, @@ -336,8 +298,9 @@ def run_tests_in_parallel( @option_mount_sources @click.option( "--test-type", - help="Type of test to run. Note that with Providers, you can also specify which provider " - 'tests should be run - for example --test-type "Providers[airbyte,http]"', + help="Type of test to run. With Providers, you can specify tests of which providers " + "should be run: `Providers[airbyte,http]` or " + "excluded from the full test suite: `Providers[-amazon,google]`", default="All", type=NotVerifiedBetterChoice(ALLOWED_TEST_TYPE_CHOICES), ) @@ -361,12 +324,6 @@ def run_tests_in_parallel( show_default=True, envvar="PARALLEL_TEST_TYPES", ) -@click.option( - "--full-tests-needed", - help="Whether full set of tests is run.", - is_flag=True, - envvar="FULL_TESTS_NEEDED", -) @click.option( "--upgrade-boto", help="Remove aiobotocore and upgrade botocore and boto to the latest version.", @@ -405,7 +362,6 @@ def command_for_tests( debug_resources: bool, include_success_outputs: bool, parallel_test_types: str, - full_tests_needed: bool, mount_sources: str, extra_pytest_args: tuple, upgrade_boto: bool, @@ -434,16 +390,11 @@ def command_for_tests( perform_environment_checks() if run_in_parallel: test_list = parallel_test_types.split(" ") - test_list.sort(key=lambda x: x in ["Providers", "WWW"], reverse=True) run_tests_in_parallel( exec_shell_params=exec_shell_params, parallel_test_types_list=test_list, extra_pytest_args=extra_pytest_args, db_reset=db_reset, - # Allow to pass information on whether to use full tests in the parallel execution mode - # or not - this will allow to skip some heavy tests on more resource-heavy configurations - # in case full tests are not required, some of those will be skipped - full_tests_needed=full_tests_needed, test_timeout=test_timeout, include_success_outputs=include_success_outputs, parallelism=parallelism, diff --git a/dev/breeze/src/airflow_breeze/commands/testing_commands_config.py b/dev/breeze/src/airflow_breeze/commands/testing_commands_config.py index 5696510939a0f..793175e5f2405 100644 --- a/dev/breeze/src/airflow_breeze/commands/testing_commands_config.py +++ b/dev/breeze/src/airflow_breeze/commands/testing_commands_config.py @@ -46,7 +46,6 @@ "--skip-cleanup", "--debug-resources", "--include-success-outputs", - "--full-tests-needed", ], }, { diff --git a/dev/breeze/src/airflow_breeze/params/shell_params.py b/dev/breeze/src/airflow_breeze/params/shell_params.py index e854c3db83e62..ff941b4493826 100644 --- a/dev/breeze/src/airflow_breeze/params/shell_params.py +++ b/dev/breeze/src/airflow_breeze/params/shell_params.py @@ -47,6 +47,7 @@ MSSQL_TMP_DIR_NAME, SCRIPTS_CI_DIR, ) +from airflow_breeze.utils.run_tests import file_name_from_test_type from airflow_breeze.utils.run_utils import get_filesystem_type, run_command from airflow_breeze.utils.shared_options import get_verbose @@ -267,9 +268,13 @@ def command_passed(self): @property def mssql_data_volume(self) -> str: docker_filesystem = get_filesystem_type("/var/lib/docker") - # in case of Providers[....], only leave Providers - base_test_type = self.test_type.split("[")[0] if self.test_type else None - volume_name = f"tmp-mssql-volume-{base_test_type}" if base_test_type else "tmp-mssql-volume" + # Make sure the test type is not too long to be used as a volume name in docker-compose + # The tmp directory in our self-hosted runners can be quite long, so we should limit the volume name + volume_name = ( + "tmp-mssql-volume-" + file_name_from_test_type(self.test_type)[:20] + if self.test_type + else "tmp-mssql-volume" + ) if docker_filesystem == "tmpfs": return os.fspath(Path.home() / MSSQL_TMP_DIR_NAME / f"{volume_name}-{self.mssql_version}") else: diff --git a/dev/breeze/src/airflow_breeze/utils/run_tests.py b/dev/breeze/src/airflow_breeze/utils/run_tests.py index 3687e58e93968..ac2c160fa5c18 100644 --- a/dev/breeze/src/airflow_breeze/utils/run_tests.py +++ b/dev/breeze/src/airflow_breeze/utils/run_tests.py @@ -17,6 +17,7 @@ from __future__ import annotations import os +import re import sys from subprocess import DEVNULL @@ -75,3 +76,8 @@ def run_docker_compose_tests(image_name: str, extra_pytest_args: tuple) -> tuple check=False, ) return command_result.returncode, f"Testing docker-compose python with {image_name}" + + +def file_name_from_test_type(test_type: str): + test_type_no_brackets = test_type.lower().replace("[", "_").replace("]", "") + return re.sub("[,.]", "_", test_type_no_brackets)[:30] diff --git a/dev/breeze/src/airflow_breeze/utils/selective_checks.py b/dev/breeze/src/airflow_breeze/utils/selective_checks.py index 36a85d56dc8bb..36b85f5f76255 100644 --- a/dev/breeze/src/airflow_breeze/utils/selective_checks.py +++ b/dev/breeze/src/airflow_breeze/utils/selective_checks.py @@ -588,6 +588,36 @@ def _get_test_types_to_run(self) -> list[str]: get_console().print(sorted_candidate_test_types) return sorted_candidate_test_types + @staticmethod + def _extract_long_provider_tests(current_test_types: set[str]): + """ + In case there are Provider tests in the list of test to run (either in the form of + Providers or Providers[...] we subtract them from the test type, + and add them to the list of tests to run individually. + + In case of Providers, we need to replace it with Providers[-], but + in case of Providers[list_of_tests] we need to remove the long tests from the list. + + """ + long_tests = ["amazon", "google"] + for original_test_type in tuple(current_test_types): + if original_test_type == "Providers": + current_test_types.remove(original_test_type) + for long_test in long_tests: + current_test_types.add(f"Providers[{long_test}]") + current_test_types.add(f"Providers[-{','.join(long_tests)}]") + elif original_test_type.startswith("Providers["): + provider_tests_to_run = ( + original_test_type.replace("Providers[", "").replace("]", "").split(",") + ) + if any(long_test in provider_tests_to_run for long_test in long_tests): + current_test_types.remove(original_test_type) + for long_test in long_tests: + if long_test in provider_tests_to_run: + current_test_types.add(f"Providers[{long_test}]") + provider_tests_to_run.remove(long_test) + current_test_types.add(f"Providers[{','.join(provider_tests_to_run)}]") + @cached_property def parallel_test_types(self) -> str: if not self.run_tests: @@ -606,7 +636,25 @@ def parallel_test_types(self) -> str: ) test_types_to_remove.add(test_type) current_test_types = current_test_types - test_types_to_remove - return " ".join(sorted(current_test_types)) + + self._extract_long_provider_tests(current_test_types) + + # this should be hard-coded as we want to have very specific sequence of tests + sorting_order = ["Core", "Providers[-amazon,google]", "Other", "Providers[amazon]", "WWW"] + + def sort_key(t: str) -> str: + # Put the test types in the order we want them to run + if t in sorting_order: + return str(sorting_order.index(t)) + else: + return str(len(sorting_order)) + t + + return " ".join( + sorted( + current_test_types, + key=sort_key, + ) + ) @cached_property def basic_checks_only(self) -> bool: diff --git a/dev/breeze/tests/test_selective_checks.py b/dev/breeze/tests/test_selective_checks.py index c468e5a78abd3..ff22b2c1c5f23 100644 --- a/dev/breeze/tests/test_selective_checks.py +++ b/dev/breeze/tests/test_selective_checks.py @@ -16,15 +16,27 @@ # under the License. from __future__ import annotations +import re + import pytest from airflow_breeze.global_constants import GithubEvents from airflow_breeze.utils.selective_checks import SelectiveChecks +ANSI_COLORS_MATCHER = re.compile(r"(?:\x1B[@-_]|[\x80-\x9F])[0-?]*[ -/]*[@-~]") + + +def escape_ansi_colors(line): + return ANSI_COLORS_MATCHER.sub("", line) + def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): for name, value in expected_outputs.items(): - assert f"{name}={value}" in stderr + search_string = rf"^{re.escape(name)}={re.escape(value)}$" + escaped_stderr = escape_ansi_colors(stderr) + assert re.search( + search_string, escaped_stderr, re.MULTILINE + ), f"Expected {name}={value} not found in {escaped_stderr}" @pytest.mark.parametrize( @@ -44,7 +56,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "run-amazon-tests": "false", "docs-build": "false", "upgrade-to-newer-dependencies": "false", - "test-types": "", + "parallel-test-types": "", }, id="No tests on simple change", ) @@ -63,7 +75,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "run-amazon-tests": "false", "docs-build": "true", "upgrade-to-newer-dependencies": "false", - "test-types": "API Always", + "parallel-test-types": "API Always", }, id="Only API tests and DOCS should run", ) @@ -85,7 +97,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "run-amazon-tests": "true", "docs-build": "true", "upgrade-to-newer-dependencies": "false", - "test-types": "API Always Providers[amazon,common.sql,google,postgres]", + "parallel-test-types": "Providers[amazon] " + "API Always Providers[common.sql,postgres] Providers[google]", }, id="API and providers tests and docs should run", ) @@ -105,7 +118,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "docs-build": "false", "run-kubernetes-tests": "false", "upgrade-to-newer-dependencies": "false", - "test-types": "Always Providers[apache.beam,google]", + "parallel-test-types": "Always Providers[apache.beam] Providers[google]", }, id="Selected Providers and docs should run", ) @@ -125,7 +138,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "docs-build": "true", "run-kubernetes-tests": "false", "upgrade-to-newer-dependencies": "false", - "test-types": "", + "parallel-test-types": "", }, id="Only docs builds should run - no tests needed", ) @@ -148,7 +161,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "docs-build": "true", "run-kubernetes-tests": "true", "upgrade-to-newer-dependencies": "false", - "test-types": "Always Providers[amazon,common.sql,google,postgres]", + "parallel-test-types": "Providers[amazon] " + "Always Providers[common.sql,postgres] Providers[google]", }, id="Helm tests, providers (both upstream and downstream)," "kubernetes tests and docs should run", @@ -173,7 +187,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "docs-build": "true", "run-kubernetes-tests": "true", "upgrade-to-newer-dependencies": "false", - "test-types": "Always Providers[airbyte,apache.livy,dbt.cloud,dingding,discord,http]", + "parallel-test-types": "Always " + "Providers[airbyte,apache.livy,dbt.cloud,dingding,discord,http]", }, id="Helm tests, http and all relevant providers, kubernetes tests and " "docs should run even if unimportant files were added", @@ -198,7 +213,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "docs-build": "true", "run-kubernetes-tests": "true", "upgrade-to-newer-dependencies": "false", - "test-types": "Always Providers[airbyte,http]", + "parallel-test-types": "Always Providers[airbyte,http]", }, id="Helm tests, airbyte/http providers, kubernetes tests and " "docs should run even if unimportant files were added", @@ -223,7 +238,7 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "run-amazon-tests": "false", "run-kubernetes-tests": "true", "upgrade-to-newer-dependencies": "false", - "test-types": "Always", + "parallel-test-types": "Always", }, id="Docs should run even if unimportant files were added", ) @@ -242,7 +257,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "run-amazon-tests": "true", "docs-build": "true", "upgrade-to-newer-dependencies": "true", - "test-types": "API Always CLI Core Other Providers WWW", + "parallel-test-types": "Core Providers[-amazon,google] Other Providers[amazon] WWW " + "API Always CLI Providers[google]", }, id="Everything should run - including all providers and upgrading to " "newer requirements as setup.py changed and all Python versions", @@ -262,7 +278,8 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "run-amazon-tests": "true", "docs-build": "true", "upgrade-to-newer-dependencies": "true", - "test-types": "API Always CLI Core Other Providers WWW", + "parallel-test-types": "Core Providers[-amazon,google] Other Providers[amazon] WWW " + "API Always CLI Providers[google]", }, id="Everything should run and upgrading to newer requirements as dependencies change", ) @@ -281,8 +298,9 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "run-kubernetes-tests": "false", "upgrade-to-newer-dependencies": "false", "run-amazon-tests": "true", - "test-types": "Always Providers[amazon,apache.hive,cncf.kubernetes," - "common.sql,exasol,ftp,google,imap,mongo,mysql,postgres,salesforce,ssh]", + "parallel-test-types": "Providers[amazon] Always " + "Providers[apache.hive,cncf.kubernetes,common.sql,exasol,ftp,imap," + "mongo,mysql,postgres,salesforce,ssh] Providers[google]", }, id="Providers tests run including amazon tests if amazon provider files changed", ), @@ -300,9 +318,9 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "docs-build": "false", "run-kubernetes-tests": "false", "upgrade-to-newer-dependencies": "false", - "test-types": "Always Providers[airbyte,http]", + "parallel-test-types": "Always Providers[airbyte,http]", }, - id="Providers tests run including amazon tests if amazon tests provider files changed", + id="Providers tests tests run without amazon tests if no amazon file changed", ), pytest.param( ("airflow/providers/amazon/file.py",), @@ -318,10 +336,11 @@ def assert_outputs_are_printed(expected_outputs: dict[str, str], stderr: str): "docs-build": "true", "run-kubernetes-tests": "false", "upgrade-to-newer-dependencies": "false", - "test-types": "Always Providers[amazon,apache.hive,cncf.kubernetes," - "common.sql,exasol,ftp,google,imap,mongo,mysql,postgres,salesforce,ssh]", + "parallel-test-types": "Providers[amazon] Always " + "Providers[apache.hive,cncf.kubernetes,common.sql,exasol,ftp," + "imap,mongo,mysql,postgres,salesforce,ssh] Providers[google]", }, - id="Providers tests run without amazon tests if no amazon file changed", + id="Providers tests run including amazon tests if amazon provider files changed", ), ], ) @@ -358,7 +377,8 @@ def test_expected_output_pull_request_main( "full-tests-needed": "true", "providers-package-format-exclude": "[]", "upgrade-to-newer-dependencies": "false", - "test-types": "API Always CLI Core Other Providers WWW", + "parallel-test-types": "Core Providers[-amazon,google] Other Providers[amazon] WWW " + "API Always CLI Providers[google]", }, id="Everything should run including all providers when full tests are needed", ) @@ -382,7 +402,8 @@ def test_expected_output_pull_request_main( "full-tests-needed": "true", "providers-package-format-exclude": "[]", "upgrade-to-newer-dependencies": "false", - "test-types": "API Always CLI Core Other Providers WWW", + "parallel-test-types": "Core Providers[-amazon,google] Other Providers[amazon] WWW " + "API Always CLI Providers[google]", }, id="Everything should run including full providers when full " "tests are needed even with different label set as well", @@ -404,7 +425,8 @@ def test_expected_output_pull_request_main( "full-tests-needed": "true", "upgrade-to-newer-dependencies": "false", "providers-package-format-exclude": "[]", - "test-types": "API Always CLI Core Other Providers WWW", + "parallel-test-types": "Core Providers[-amazon,google] Other Providers[amazon] WWW " + "API Always CLI Providers[google]", }, id="Everything should run including full providers when" "full tests are needed even if no files are changed", @@ -426,7 +448,7 @@ def test_expected_output_pull_request_main( "full-tests-needed": "true", "upgrade-to-newer-dependencies": "false", "providers-package-format-exclude": "[]", - "test-types": "API Always CLI Core Other WWW", + "parallel-test-types": "Core Other WWW API Always CLI", }, id="Everything should run except Providers when full tests are needed for non-main branch", ) @@ -465,7 +487,7 @@ def test_expected_output_full_tests_needed( "providers-package-format-exclude": "[{'package-format': 'sdist'}]", "upgrade-to-newer-dependencies": "false", "skip-provider-tests": "true", - "test-types": "", + "parallel-test-types": "", }, id="Nothing should run if only non-important files changed", ), @@ -486,7 +508,7 @@ def test_expected_output_full_tests_needed( "run-kubernetes-tests": "true", "upgrade-to-newer-dependencies": "false", "skip-provider-tests": "true", - "test-types": "Always", + "parallel-test-types": "Always", }, id="No Helm tests, No providers should run if only chart/providers changed in non-main", ), @@ -508,7 +530,7 @@ def test_expected_output_full_tests_needed( "run-kubernetes-tests": "true", "upgrade-to-newer-dependencies": "false", "skip-provider-tests": "true", - "test-types": "Always CLI", + "parallel-test-types": "Always CLI", }, id="Only CLI tests and Kubernetes tests should run if cli/chart files changed in non-main branch", ), @@ -529,7 +551,7 @@ def test_expected_output_full_tests_needed( "run-kubernetes-tests": "false", "upgrade-to-newer-dependencies": "false", "skip-provider-tests": "true", - "test-types": "API Always CLI Core Other WWW", + "parallel-test-types": "Core Other WWW API Always CLI", }, id="All tests except Providers should run if core file changed in non-main branch", ), @@ -563,7 +585,7 @@ def test_expected_output_pull_request_v2_3( "docs-build": "false", "upgrade-to-newer-dependencies": "false", "skip-provider-tests": "false", - "test-types": "", + "parallel-test-types": "", }, id="Nothing should run if only non-important files changed", ), @@ -578,7 +600,7 @@ def test_expected_output_pull_request_v2_3( "docs-build": "true", "upgrade-to-newer-dependencies": "false", "skip-provider-tests": "false", - "test-types": "Always", + "parallel-test-types": "Always", }, id="Only Always and docs build should run if only system tests changed", ), @@ -598,9 +620,12 @@ def test_expected_output_pull_request_v2_3( "run-kubernetes-tests": "true", "upgrade-to-newer-dependencies": "false", "skip-provider-tests": "false", - "test-types": "Always CLI", + "parallel-test-types": "Providers[amazon] Always CLI " + "Providers[apache.beam,apache.cassandra,cncf.kubernetes,common.sql,facebook," + "hashicorp,microsoft.azure,microsoft.mssql,mysql,oracle,postgres,presto," + "salesforce,sftp,ssh,trino] Providers[google]", }, - id="CLI tests and Kubernetes tests should run if cli/chart files changed", + id="CLI tests and Google-related provider tests should run if cli/chart files changed", ), pytest.param( ( @@ -617,7 +642,8 @@ def test_expected_output_pull_request_v2_3( "run-kubernetes-tests": "false", "upgrade-to-newer-dependencies": "false", "skip-provider-tests": "false", - "test-types": "API Always CLI Core Other Providers WWW", + "parallel-test-types": "Core Providers[-amazon,google] Other Providers[amazon] WWW " + "API Always CLI Providers[google]", }, id="All tests should run if core file changed", ), @@ -652,7 +678,8 @@ def test_expected_output_pull_request_target( "run-tests": "true", "docs-build": "true", "upgrade-to-newer-dependencies": "true", - "test-types": "API Always CLI Core Other Providers WWW", + "parallel-test-types": "Core Providers[-amazon,google] Other Providers[amazon] WWW " + "API Always CLI Providers[google]", }, id="All tests run on push even if unimportant file changed", ), @@ -668,7 +695,7 @@ def test_expected_output_pull_request_target( "run-tests": "true", "docs-build": "true", "upgrade-to-newer-dependencies": "true", - "test-types": "API Always CLI Core Other WWW", + "parallel-test-types": "Core Other WWW API Always CLI", }, id="All tests except Providers and Helm run on push" " even if unimportant file changed in non-main branch", @@ -685,7 +712,8 @@ def test_expected_output_pull_request_target( "run-tests": "true", "docs-build": "true", "upgrade-to-newer-dependencies": "true", - "test-types": "API Always CLI Core Other Providers WWW", + "parallel-test-types": "Core Providers[-amazon,google] Other Providers[amazon] WWW " + "API Always CLI Providers[google]", }, id="All tests run on push if core file changed", ), @@ -736,7 +764,8 @@ def test_no_commit_provided_trigger_full_build_for_any_event_type(github_event): "upgrade-to-newer-dependencies": "true" if github_event in [GithubEvents.PUSH, GithubEvents.SCHEDULE] else "false", - "test-types": "API Always CLI Core Other Providers WWW", + "parallel-test-types": "Core Providers[-amazon,google] Other Providers[amazon] WWW " + "API Always CLI Providers[google]", }, str(stderr), ) diff --git a/images/breeze/output-commands-hash.txt b/images/breeze/output-commands-hash.txt index 7a36b71736c6c..de1f4ec2a533a 100644 --- a/images/breeze/output-commands-hash.txt +++ b/images/breeze/output-commands-hash.txt @@ -60,5 +60,5 @@ stop:e5aa686b4e53707ced4039d8414d5cd6 testing:docker-compose-tests:b86c044b24138af0659a05ed6331576c testing:helm-tests:936cf28fd84ce4ff5113795fdae9624b testing:integration-tests:225ddb6243cce5fc64f4824b87adfd98 -testing:tests:86441445a2b521e8d5aee04d74978451 -testing:2d95034763ee699f2e2fc1804f2fd7f0 +testing:tests:b96f54a7e08986e2309af33141099e8d +testing:8d1f02ebc1119bdf93e027a4f291237f diff --git a/images/breeze/output_testing_tests.svg b/images/breeze/output_testing_tests.svg index 434b6d43b6560..d303079df20f7 100644 --- a/images/breeze/output_testing_tests.svg +++ b/images/breeze/output_testing_tests.svg @@ -1,4 +1,4 @@ - + - + @@ -195,12 +195,9 @@ - - - - Command: testing tests + Command: testing tests @@ -216,52 +213,51 @@ Run the specified unit test targets. ╭─ Basic flag for tests command ───────────────────────────────────────────────────────────────────────────────────────╮ ---test-typeType of test to run. Note that with Providers, you can also specify which provider tests     -should be run - for example --test-type "Providers[airbyte,http]"                            -(All | API | Always | CLI | Core | Other | Providers | WWW | PlainAsserts | Postgres | MySQL -| Quarantine)                                                                                +--test-typeType of test to run. With Providers, you can specify tests of which providers should be run: +`Providers[airbyte,http]` or excluded from the full test suite: `Providers[-amazon,google]`  +(All | API | Always | CLI | Core | Other | Providers | WWW | PlainAsserts | Postgres | MySQL +| Quarantine)                                                                                --test-timeoutTest timeout. Set the pytest setup, execution and teardown timeouts to this value -(INTEGER RANGE)                                                                   +(INTEGER RANGE)                                                                   [default: 60; x>=0]                                                               --collect-onlyCollect tests only, do not run them. ---db-reset-dReset DB when entering the container. ---backend-bDatabase backend to use.(>sqlite< | mysql | postgres | mssql)[default: sqlite] ---python-pPython major/minor version used in Airflow image for images.(>3.7< | 3.8 | 3.9 | 3.10) +--db-reset-dReset DB when entering the container. +--backend-bDatabase backend to use.(>sqlite< | mysql | postgres | mssql)[default: sqlite] +--python-pPython major/minor version used in Airflow image for images.(>3.7< | 3.8 | 3.9 | 3.10) [default: 3.7]                                               ---postgres-version-PVersion of Postgres used.(>11< | 12 | 13 | 14 | 15)[default: 11] ---mysql-version-MVersion of MySQL used.(>5.7< | 8)[default: 5.7] ---mssql-version-SVersion of MsSQL used.(>2017-latest< | 2019-latest)[default: 2017-latest] +--postgres-version-PVersion of Postgres used.(>11< | 12 | 13 | 14 | 15)[default: 11] +--mysql-version-MVersion of MySQL used.(>5.7< | 8)[default: 5.7] +--mssql-version-SVersion of MsSQL used.(>2017-latest< | 2019-latest)[default: 2017-latest] --integrationIntegration(s) to enable when running (can be more than one).                                -(all | all-testable | cassandra | celery | kerberos | mongo | otel | pinot | statsd | statsd -| trino)                                                                                     +(all | all-testable | cassandra | celery | kerberos | mongo | otel | pinot | statsd | statsd +| trino)                                                                                     ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭─ Options for parallel test commands ─────────────────────────────────────────────────────────────────────────────────╮ --run-in-parallelRun the operation in parallel on all or selected subset of Python versions. --parallelismMaximum number of processes to use while running the operation in parallel. -(INTEGER RANGE)                                                             +(INTEGER RANGE)                                                             [default: 4; 1<=x<=8]                                                       ---parallel-test-typesSpace separated list of test types used for testing in parallel.(TEXT) +--parallel-test-typesSpace separated list of test types used for testing in parallel.(TEXT) [default: API Always CLI Core Other Providers WWW PlainAsserts]  --skip-cleanupSkip cleanup of temporary files created during parallel run. --debug-resourcesWhether to show resource information while running in parallel. --include-success-outputsWhether to include outputs of successful parallel runs (skipped by default). ---full-tests-neededWhether full set of tests is run. -╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ -╭─ Advanced flag for tests command ────────────────────────────────────────────────────────────────────────────────────╮ ---image-tag-tTag of the image which is used to run the image (implies --mount-sources=skip).(TEXT) -[default: latest]                                                               ---mount-sourcesChoose scope of local sources that should be mounted, skipped, or removed (default =      -selected).                                                                                -(selected | all | skip | remove)                                                          -[default: selected]                                                                       ---upgrade-botoRemove aiobotocore and upgrade botocore and boto to the latest version. ---remove-arm-packagesRemoves arm packages from the image to test if ARM collection works -╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ -╭─ Common options ─────────────────────────────────────────────────────────────────────────────────────────────────────╮ ---verbose-vPrint verbose information about performed steps. ---dry-run-DIf dry-run is set, commands are only printed, not executed. ---help-hShow this message and exit. -╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ +╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ +╭─ Advanced flag for tests command ────────────────────────────────────────────────────────────────────────────────────╮ +--image-tag-tTag of the image which is used to run the image (implies --mount-sources=skip).(TEXT) +[default: latest]                                                               +--mount-sourcesChoose scope of local sources that should be mounted, skipped, or removed (default =      +selected).                                                                                +(selected | all | skip | remove)                                                          +[default: selected]                                                                       +--upgrade-botoRemove aiobotocore and upgrade botocore and boto to the latest version. +--remove-arm-packagesRemoves arm packages from the image to test if ARM collection works +╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ +╭─ Common options ─────────────────────────────────────────────────────────────────────────────────────────────────────╮ +--verbose-vPrint verbose information about performed steps. +--dry-run-DIf dry-run is set, commands are only printed, not executed. +--help-hShow this message and exit. +╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ diff --git a/scripts/docker/entrypoint_ci.sh b/scripts/docker/entrypoint_ci.sh index 642b6f002c7ce..9acd01754dfae 100755 --- a/scripts/docker/entrypoint_ci.sh +++ b/scripts/docker/entrypoint_ci.sh @@ -478,6 +478,19 @@ else ${TEST_TYPE} == "Postgres" || ${TEST_TYPE} == "MySQL" || \ ${TEST_TYPE} == "Long" ]]; then SELECTED_TESTS=("${ALL_TESTS[@]}") + elif [[ ${TEST_TYPE} =~ Providers\[\-(.*)\] ]]; then + # When providers start with `-` it means that we should run all provider tests except those + SELECTED_TESTS=("${PROVIDERS_TESTS[@]}") + for provider in ${BASH_REMATCH[1]//,/ } + do + providers_dir="tests/providers/${provider//./\/}" + if [[ -d ${providers_dir} ]]; then + echo "${COLOR_BLUE}Ignoring ${providers_dir} as it has been deselected.${COLOR_RESET}" + EXTRA_PYTEST_ARGS+=("--ignore=tests/providers/${provider//./\/}") + else + echo "${COLOR_YELLOW}Skipping ${providers_dir} as the directory does not exist.${COLOR_RESET}" + fi + done elif [[ ${TEST_TYPE} =~ Providers\[(.*)\] ]]; then SELECTED_TESTS=() for provider in ${BASH_REMATCH[1]//,/ }