-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Apache Airflow Provider(s)
amazon
Versions of Apache Airflow Providers
apache-airflow-providers-amazon==9.2.0
Apache Airflow version
2.10.5
Operating System
Debian GNU/Linux 12 (bookworm)
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
What happened
When using S3KeySensor from airflow.providers.amazon.aws.sensors.s3
S3KeySensor(
task_id="task1",
bucket_key=f"s3://...",
wildcard_match=True,
aws_conn_id=conn_id,
mode="reschedule",
timeout=3 * 60 * 60,
dag=dag
)After upgrading from Airflow 2.10.2 to 2.10.5, we accidentally faces an incident with message
...
File "/home/airflow/.local/lib/python3.11/site-packages/botocore/paginate.py", line 357, in _make_request
return self._method(**current_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/botocore/client.py", line 569, in _api_call
return self._make_api_call(operation_name, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/airflow/.local/lib/python3.11/site-packages/botocore/client.py", line 1023, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation: The AWS Access Key Id you provided does not exist in our records.This is very hard to find the root cause
What you think should happen instead
After hours of investigating, we finally found out that the Extra field in connection was
{
"host": "http://s3.sample.com:1234"
}
Because we used host field in apache-airflow-providers-amazon==8.28.0 in Airflow 2.10.2. After upgrading to 2.10.5, this causes failure. After adding "endpoint_url": "http://s3.sample.com:1234" in the JSON, it has been fixed.
In providers/amazon/aws/utils/connection_wrapper.py, function __post_init__ of class AwsConnectionWrapper, the line
self.endpoint_url = extra.get("endpoint_url")should not be like that. Must check if endpoint_url exists in the extra, if not, logs some messsage to stdout, like "Missing endpoint_url in extra of connection."
How to reproduce
Use S3KeySensor, and the Extra field in connection, do not set endpoint_url
from airflow.providers.amazon.aws.sensors.s3 import S3KeySensor
S3KeySensor(
task_id="task1",
bucket_key=f"s3://...",
wildcard_match=True,
aws_conn_id=conn_id,
mode="reschedule",
timeout=3 * 60 * 60,
dag=dag
)Anything else
The release note also needs to update this information, in my point of view
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct