Skip to content

Misleading Exception message when missing endpoint_url in Extra field of AWS Connection of Airflow 2.10.5 #52697

@vmtuan12

Description

@vmtuan12

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==9.2.0

Apache Airflow version

2.10.5

Operating System

Debian GNU/Linux 12 (bookworm)

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

When using S3KeySensor from airflow.providers.amazon.aws.sensors.s3

S3KeySensor(
    task_id="task1",
    bucket_key=f"s3://...",
    wildcard_match=True,
    aws_conn_id=conn_id,
    mode="reschedule",
    timeout=3 * 60 * 60,
    dag=dag
)

After upgrading from Airflow 2.10.2 to 2.10.5, we accidentally faces an incident with message

...
File "/home/airflow/.local/lib/python3.11/site-packages/botocore/paginate.py", line 357, in _make_request
  return self._method(**current_kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/botocore/client.py", line 569, in _api_call
  return self._make_api_call(operation_name, kwargs)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/botocore/client.py", line 1023, in _make_api_call
  raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation: The AWS Access Key Id you provided does not exist in our records.

This is very hard to find the root cause

What you think should happen instead

After hours of investigating, we finally found out that the Extra field in connection was

{
  "host": "http://s3.sample.com:1234"
}

Because we used host field in apache-airflow-providers-amazon==8.28.0 in Airflow 2.10.2. After upgrading to 2.10.5, this causes failure. After adding "endpoint_url": "http://s3.sample.com:1234" in the JSON, it has been fixed.

In providers/amazon/aws/utils/connection_wrapper.py, function __post_init__ of class AwsConnectionWrapper, the line

self.endpoint_url = extra.get("endpoint_url")

should not be like that. Must check if endpoint_url exists in the extra, if not, logs some messsage to stdout, like "Missing endpoint_url in extra of connection."

How to reproduce

Use S3KeySensor, and the Extra field in connection, do not set endpoint_url

from airflow.providers.amazon.aws.sensors.s3 import S3KeySensor

S3KeySensor(
    task_id="task1",
    bucket_key=f"s3://...",
    wildcard_match=True,
    aws_conn_id=conn_id,
    mode="reschedule",
    timeout=3 * 60 * 60,
    dag=dag
)

Anything else

The release note also needs to update this information, in my point of view

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions