Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion airflow/config_templates/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ core:
version_added: 2.3.0
type: string
example: ~
default: "regexp"
default: "glob"
default_task_retries:
description: |
The number of retries each task is going to have by default. Can be overridden at dag or task level.
Expand Down
6 changes: 3 additions & 3 deletions airflow/utils/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ def _find_path_from_directory(
def find_path_from_directory(
base_dir_path: str | os.PathLike[str],
ignore_file_name: str,
ignore_file_syntax: str = conf.get_mandatory_value("core", "DAG_IGNORE_FILE_SYNTAX", fallback="regexp"),
ignore_file_syntax: str = conf.get_mandatory_value("core", "DAG_IGNORE_FILE_SYNTAX", fallback="glob"),
) -> Generator[str, None, None]:
"""
Recursively search the base path for a list of file paths that should not be ignored.
Expand All @@ -232,9 +232,9 @@ def find_path_from_directory(

:return: a generator of file paths.
"""
if ignore_file_syntax == "glob":
if ignore_file_syntax == "glob" or not ignore_file_syntax:
return _find_path_from_directory(base_dir_path, ignore_file_name, _GlobIgnoreRule)
elif ignore_file_syntax == "regexp" or not ignore_file_syntax:
elif ignore_file_syntax == "regexp":
return _find_path_from_directory(base_dir_path, ignore_file_name, _RegexpIgnoreRule)
else:
raise ValueError(f"Unsupported ignore_file_syntax: {ignore_file_syntax}")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,14 +125,7 @@ for the paths that should be ignored. You do not need to have that file in any o
In the example above the DAGs are only in ``my_custom_dags`` folder, the ``common_package`` should not be
scanned by scheduler when searching for DAGS, so we should ignore ``common_package`` folder. You also
want to ignore the ``base_dag.py`` if you keep a base DAG there that ``my_dag1.py`` and ``my_dag2.py`` derives
from. Your ``.airflowignore`` should look then like this:

.. code-block:: none

my_company/common_package/.*
my_company/my_custom_dags/base_dag\.py

If ``DAG_IGNORE_FILE_SYNTAX`` is set to ``glob``, the equivalent ``.airflowignore`` file would be:
from. Your ``.airflowignore`` should look then like this (using the default ``glob`` syntax):

.. code-block:: none

Expand Down
31 changes: 12 additions & 19 deletions docs/apache-airflow/core-concepts/dags.rst
Original file line number Diff line number Diff line change
Expand Up @@ -712,19 +712,9 @@ configuration parameter (*added in Airflow 2.3*): ``regexp`` and ``glob``.

.. note::

The default ``DAG_IGNORE_FILE_SYNTAX`` is ``regexp`` to ensure backwards compatibility.
The default ``DAG_IGNORE_FILE_SYNTAX`` is ``glob`` in Airflow 3 or later (in previous versions it was ``regexp``).

For the ``regexp`` pattern syntax (the default), each line in ``.airflowignore``
specifies a regular expression pattern, and directories or files whose names (not DAG id)
match any of the patterns would be ignored (under the hood, ``Pattern.search()`` is used
to match the pattern). Use the ``#`` character to indicate a comment; all characters
on lines starting with ``#`` will be ignored.

As with most regexp matching in Airflow, the regexp engine is ``re2``, which explicitly
doesn't support many advanced features, please check its
`documentation <https://github.com/google/re2/wiki/Syntax>`_ for more information.

With the ``glob`` syntax, the patterns work just like those in a ``.gitignore`` file:
With the ``glob`` syntax (the default), the patterns work just like those in a ``.gitignore`` file:

* The ``*`` character will match any number of characters, except ``/``
* The ``?`` character will match any single character, except ``/``
Expand All @@ -738,15 +728,18 @@ With the ``glob`` syntax, the patterns work just like those in a ``.gitignore``
is relative to the directory level of the particular .airflowignore file itself. Otherwise the
pattern may also match at any level below the .airflowignore level.

The ``.airflowignore`` file should be put in your ``DAG_FOLDER``. For example, you can prepare
a ``.airflowignore`` file using the ``regexp`` syntax with content

.. code-block::
For the ``regexp`` pattern syntax, each line in ``.airflowignore``
specifies a regular expression pattern, and directories or files whose names (not DAG id)
match any of the patterns would be ignored (under the hood, ``Pattern.search()`` is used
to match the pattern). Use the ``#`` character to indicate a comment; all characters
on lines starting with ``#`` will be ignored.

project_a
tenant_[\d]
As with most regexp matching in Airflow, the regexp engine is ``re2``, which explicitly
doesn't support many advanced features, please check its
`documentation <https://github.com/google/re2/wiki/Syntax>`_ for more information.

Or, equivalently, in the ``glob`` syntax
The ``.airflowignore`` file should be put in your ``DAG_FOLDER``. For example, you can prepare
a ``.airflowignore`` file with the ``glob`` syntax

.. code-block::

Expand Down
2 changes: 1 addition & 1 deletion docs/apache-airflow/howto/dynamic-dag-generation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ Then you can import and use the ``ALL_TASKS`` constant in all your DAGs like tha
...

Don't forget that in this case you need to add empty ``__init__.py`` file in the ``my_company_utils`` folder
and you should add the ``my_company_utils/.*`` line to ``.airflowignore`` file (if using the regexp ignore
and you should add the ``my_company_utils/*`` line to ``.airflowignore`` file (using the default glob
syntax), so that the whole folder is ignored by the scheduler when it looks for DAGs.


Expand Down
7 changes: 7 additions & 0 deletions newsfragments/42436.significant.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Default ``.airflowignore`` syntax changed to ``glob``

The default value to the configuration ``[core] dag_ignore_file_syntax`` has
been changed to ``glob``, which better matches the ignore file behavior of many
popular tools.

To revert to the previous behavior, set the configuration to ``regexp``.
5 changes: 2 additions & 3 deletions tests/dags/.airflowignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
.*_invalid.* # Skip invalid files
subdir3 # Skip the nested subdir3 directory
# *badrule # This rule is an invalid regex. It would be warned about and skipped.
*_invalid_* # Skip invalid files
subdir3 # Skip the nested subdir3 directory
2 changes: 1 addition & 1 deletion tests/dags/subdir1/.airflowignore
Original file line number Diff line number Diff line change
@@ -1 +1 @@
.*_ignore_this.py # Ignore files ending with "_ignore_this.py"
*_ignore_this.py # Ignore files ending with "_ignore_this.py"
2 changes: 1 addition & 1 deletion tests/plugins/test_plugin_ignore.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def test_find_not_should_ignore_path_regexp(self, tmp_path):
"test_load_sub1.py",
}
ignore_list_file = ".airflowignore"
for file_path in find_path_from_directory(plugin_folder_path, ignore_list_file):
for file_path in find_path_from_directory(plugin_folder_path, ignore_list_file, "regexp"):
file_path = Path(file_path)
if file_path.is_file() and file_path.suffix == ".py":
detected_files.add(file_path.name)
Expand Down