Skip to content

Conversation

@kbumsik
Copy link

@kbumsik kbumsik commented May 14, 2023

I scanned through other AWS Operators, but Batch is the only Operator that uses None for default aws_conn_id.

This commit is to fix it to make it consistent.

kbumsik added 2 commits May 14, 2023 10:32
I scanned through other AWS Operators, but Batch is the only
Operator that uses None for default aws_conn_id.
This commit is to fix it to make it consistent.
@kbumsik kbumsik requested review from eladkal and o-nikolas as code owners May 14, 2023 10:38
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels May 14, 2023
Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when we provide None as aws_conn_id?

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when we provide None as aws_conn_id?

@kbumsik
Copy link
Author

kbumsik commented May 14, 2023

In AWS Batch Operator, it falls back to the default behavior of boto3, which normally uses AWS_ prefixed environment variables or AWS configs in ~/.aws.

This is specifically mentioned in the doc:

https://github.com/apache/airflow/blob/3193857376bc2c8cd2eb133017be1e8cbcaa8405/airflow/providers/amazon/aws/operators/batch.py#LL76C3-L77C19

I am not sure this fallback behavior is the same across other AWS Operators though. Many of them doesn't allow None type for aws_conn_id and unlike AWS Batch their None case is not described in doc comment.

max_retries: int | None = None,
status_retries: int | None = None,
aws_conn_id: str | None = None,
aws_conn_id: str | None = 'aws_default',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going through the operators we have in amazon/aws provider. In some places we just have a type annotation saying it is a str value. While other places have str | None with default aws_default. Do we want to continue accepting None values and fallback to boto3 looking for the credentials in the ~/aws/credentials file in general? In that case, should we update the type annotation to str | None in all the operators? If we do not want to fallback and want to mandate an Airflow connection to be used, should we raise an exception when the aws_conn_id value is not set?

Additionally, I also see that some operators have aws_conn_id in their template_fields but not all. Do we need to unify this experience across all operators?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are your thoughts on this?
cc: @eladkal @ferruzzi @vincbeck @syedahsn @potiuk

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, users still can use an AWS Connection with no credential specified in Connection, then the connection uses boto3 credentials.

This is what I am using for AWS Provider. I am using KubernetesExecutor and I gave S3 permission to worker pods using k8s service account (AWS IRSA) and I make aws_default with blank fields.

So if we decide not allowing None value for aws_default then the mitigation is easy. This is kinda similar to what happened to KPO #28848 #31187

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behaviour is complicated for sure, there are three paths:

  1. You provide no value at all for the conn id to the base aws hook: Then inside the base aws hook the default conn id is used, which will then cause the standard boto3 creds lookup to be used if the conn is empty or it will use whatever credentials someone has added to the default aws conn.
  2. You provide None as the conn id: in this case the base aws hook will always use the boto3 lookup no matter what.
  3. You provide some specific conn id that you have created with specific credentials in it which will be used.

So the behaviours of all three are distinctly different and you cannot simply change it from None to "aws_default" with no effect. HOWEVER, it is a very small chance this would affect users and there are lots of diverse perspectives for what is considered a breaking change. You could argue this is a bug (it's unexpectedly not following the standard of all other aws operators) and so should be fixed without having to concern ourselves with deprecation warnings.

@hussein-awala
Copy link
Member

In AWS Batch Operator, it falls back to the default behavior of boto3, which normally uses AWS_ prefixed environment variables or AWS configs in ~/.aws.

In this case this change is a breaking change (cc: @eladkal)

@o-nikolas
Copy link
Contributor

Looks like this one died out. Closing for now, feel free to re-open the discussion if anyone would like.

@o-nikolas o-nikolas closed this Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants