Skip to content

Conversation

@SamWheating
Copy link
Contributor

In rare cases a DAG file can be deleted between the time that the DAGs directory is scanned and the time that the file itself is processed.

This causes unhandled FileNotFoundErrors when the DagFileProcessorManager tries to get the mtime on a nonexistent file, which then crashes the DagFileProcessorManager.

Full stack trace of the issue (from Airflow 2.1.2):

Traceback (most recent call last):
--
File "/usr/local/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 370, in _run_processor_manager
processor_manager.start()
File "/usr/local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 610, in start
return self._run_parsing_loop()
File "/usr/local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 684, in _run_parsing_loop
self.prepare_file_path_queue()
File "/usr/local/lib/python3.8/site-packages/airflow/utils/dag_processing.py", line 1055, in prepare_file_path_queue
files_with_mtime[file_path] = os.path.getmtime(file_path)
File "/usr/local/lib/python3.8/genericpath.py", line 55, in getmtime
return os.stat(filename).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: '/path/to/dag.py'

Followed shortly by:

2021-10-08 00:53:04,896 {dag_processing.py:401} WARNING - DagFileProcessorManager (PID=1362135) exited with exit code 1 - re-launching

This PR introduces handling for this case - FileNotFoundErrors will now be handled gracefully and those file paths will be removed from the finalized processing queue.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Oct 14, 2021
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Oct 15, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@uranusjr uranusjr merged commit 3f6d9b6 into apache:main Oct 15, 2021
@jedcunningham jedcunningham added this to the Airflow 2.2.3 milestone Dec 7, 2021
jedcunningham pushed a commit that referenced this pull request Dec 7, 2021
@jedcunningham jedcunningham added the type:bug-fix Changelog: Bug Fixes label Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler full tests needed We need to run full set of tests for this PR to merge type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants