-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow==2.8.1
apache-airflow-providers-airbyte==3.5.1
apache-airflow-providers-amazon==8.16.0
apache-airflow-providers-databricks==6.0.0
apache-airflow-providers-docker==3.9.1
apache-airflow-providers-slack==8.5.1
apache-airflow-providers-snowflake==5.2.1
Apache Airflow version
2.8.1
Operating System
Amazon Linux
Deployment
Amazon (AWS) MWAA
Deployment details
Our MWAA environments are managed through Terraform. We install 3 providers outside of those available in the public providers.
What happened
The logs tab in the UI for EcsRunTaskOperator tasks sometimes misses events. We noted this behavior once we upgraded from version 2.5 to version 2.7 (and it has remained in version 2.8). We did not upgrade to 2.6 at any point, and newer releases are not yet available through MWAA.
We use the awslogs_group argument for the EcsRunTaskOperator to specify the logs group that the UI should pull from. The log streams in that ECS log group contains all events, but not all events in the log stream appear in the MWAA Task log group.
The following is an example from log streams that should share all logs (i.e. the MWAA Task logs should include all ECS logs). In the MWAA Task logs toward the bottom, events containing remaining DBT output logs are missing. The first several lines are fetched, and the rest of the logs that stream in later to the log stream are missed.
ECS logs:
MWAA Task logs:
Our organization reached out to AWS MWAA for support relating to this issue and this is what we were told by their support regarding why this may be happening.
As per my colleague on the CloudWatch case, when you are using CloudWatch logs, other AWS services (E.g. MWAA/ECS) are actually using the PutLogEvents API to send their log events to the corresponding log group. Now when it comes to the Airflow service, the frequency as to when PutLogEvents is run for the logs to be pushed through to the specific log groups has likely been altered within the newer versions of Airflow.
This was not reproduceable within a local Airflow setup.
This is the call made to the EcsRunTaskOperator:
super().__init__(
task_definition=dbt_ecs_name,
cluster=ECS_CLUSTER,
overrides={
"containerOverrides": [
{
"name": "dbt-om1-task",
"command": command_list,
},
],
},
launch_type="FARGATE",
network_configuration={
"awsvpcConfiguration": {
"subnets": [
os.environ.get("AIRFLOW__VAR__PRIMARY_SUBNET_ID"),
os.environ.get("AIRFLOW__VAR__SECONDARY_SUBNET_ID"),
],
"securityGroups": [
os.environ.get("AIRFLOW__VAR__SECURITY_GROUP_ID")
],
"assignPublicIp": "DISABLED",
},
},
awslogs_group="/ecs/airflow2/airflow2",
awslogs_region="us-east-1",
awslogs_stream_prefix="airflow2/dbt-om1-task",
awslogs_fetch_interval=timedelta(seconds=30),
propagate_tags="TASK_DEFINITION",
**kwargs,
)
What you think should happen instead
All log events in the ECS log group should be pulled forward by the MWAA Task log group using the task_log_fetcher.
How to reproduce
We are using a large (~2GB) custom DBT image inside our taskdef. This has happened with every execution of the EcsRunTaskOperator against our custom DBT image.
After triggering a task using the EcsRunTaskOperator, the DBT output logs should stream directly from the ECS log stream to the MWAA Task log stream.
Anything else
This has been observed in our MWAA environment against both apache-airflow==2.7.2 and apache-airflow==2.8.1. When upgrading, we did not upgrade to a 2.6 version. This issue may be related to or caused by the same root cause as #39323.
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct