-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Fix missing aws default #31276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing aws default #31276
Conversation
I scanned through other AWS Operators, but Batch is the only Operator that uses None for default aws_conn_id. This commit is to fix it to make it consistent.
hussein-awala
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when we provide None as aws_conn_id?
hussein-awala
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when we provide None as aws_conn_id?
|
In AWS Batch Operator, it falls back to the default behavior of boto3, which normally uses This is specifically mentioned in the doc: I am not sure this fallback behavior is the same across other AWS Operators though. Many of them doesn't allow None type for |
| max_retries: int | None = None, | ||
| status_retries: int | None = None, | ||
| aws_conn_id: str | None = None, | ||
| aws_conn_id: str | None = 'aws_default', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going through the operators we have in amazon/aws provider. In some places we just have a type annotation saying it is a str value. While other places have str | None with default aws_default. Do we want to continue accepting None values and fallback to boto3 looking for the credentials in the ~/aws/credentials file in general? In that case, should we update the type annotation to str | None in all the operators? If we do not want to fallback and want to mandate an Airflow connection to be used, should we raise an exception when the aws_conn_id value is not set?
Additionally, I also see that some operators have aws_conn_id in their template_fields but not all. Do we need to unify this experience across all operators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, users still can use an AWS Connection with no credential specified in Connection, then the connection uses boto3 credentials.
This is what I am using for AWS Provider. I am using KubernetesExecutor and I gave S3 permission to worker pods using k8s service account (AWS IRSA) and I make aws_default with blank fields.
So if we decide not allowing None value for aws_default then the mitigation is easy. This is kinda similar to what happened to KPO #28848 #31187
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behaviour is complicated for sure, there are three paths:
- You provide no value at all for the conn id to the base aws hook: Then inside the base aws hook the default conn id is used, which will then cause the standard boto3 creds lookup to be used if the conn is empty or it will use whatever credentials someone has added to the default aws conn.
- You provide None as the conn id: in this case the base aws hook will always use the boto3 lookup no matter what.
- You provide some specific conn id that you have created with specific credentials in it which will be used.
So the behaviours of all three are distinctly different and you cannot simply change it from None to "aws_default" with no effect. HOWEVER, it is a very small chance this would affect users and there are lots of diverse perspectives for what is considered a breaking change. You could argue this is a bug (it's unexpectedly not following the standard of all other aws operators) and so should be fixed without having to concern ourselves with deprecation warnings.
In this case this change is a breaking change (cc: @eladkal) |
|
Looks like this one died out. Closing for now, feel free to re-open the discussion if anyone would like. |
I scanned through other AWS Operators, but Batch is the only Operator that uses None for default aws_conn_id.
This commit is to fix it to make it consistent.