-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
If a task gets scheduled on a different worker host from a previous run, logs from that previous run will be unavailable from the webserver.
nkcairflow02.tradebot.com
*** Log file does not exist: /opt/tradebot/logs/airflow/dag_id=cluster_wiretap_download_mnj/run_id=scheduled__2023-04-18T05:00:00+00:00/task_id=chunk_1200_1230.download_chunk_1200_1230/attempt=2.log
*** Fetching from: http://nkcairflow02.tradebot.com:8793/log/dag_id=cluster_wiretap_download_mnj/run_id=scheduled__2023-04-18T05:00:00+00:00/task_id=chunk_1200_1230.download_chunk_1200_1230/attempt=2.log
*** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url: http://nkcairflow02.tradebot.com:8793/log/dag_id=cluster_wiretap_download_mnj/run_id=scheduled__2023-04-18T05:00:00+00:00/task_id=chunk_1200_1230.download_chunk_1200_1230/attempt=2.log
For more information check: https://httpstatuses.com/404
From my observation, it's pretty clear that this is happening because Airflow is looking for logs at the most recent hostname, not at the hostname that hosted the historical run. In the example above, attempt 2 actually ran on nkcairflow06, but the webserver is trying to fetch logs from nkcairflow02 because that is where attempt 3 ran. You can watch this happen "live" with the example DAG below - each time the task retries on a different worker, the URL that it attempts to use for historical logs changes to use the most recent hostname.
What you think should happen instead
All previous logs should be available from the webserver (as long as the log file still exists on disk).
How to reproduce
#!/usr/bin/env python3
import datetime
import logging
import socket
from airflow.decorators import dag, task
logger = logging.getLogger(__name__)
@dag(
schedule_interval='@daily',
start_date=datetime.datetime(2023, 4, 19),
default_args={
'retries': 9,
'retry_delay': 10.0,
},
)
def test_logs():
@task
def log():
logger.info(f'Running from {socket.gethostname()}')
raise RuntimeError('force fail')
log()
test_logs()
Operating System
CentOS Stream 8
Versions of Apache Airflow Providers
N/A
Deployment
Other
Deployment details
Airflow 2.5.1
Self-hosted
Postgres DB backend
CeleryExecutor
Anything else
Related issues:
- Webserver changes the base path for the log on rerun of task 404 #15767
- Failed to fetch log file from worker because webserver did not search the right worker. #16472
- Failed to fetch log file from worker. 404 Client Error: NOT FOUND #26069
- Failed to fetch log file from worker because webserver did not search the right worker. #26255
Looks like #23178 attempted to fix the issue, but it never got merged.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct