Amazon Provider Package user agent #27823

ferruzzi · 2022-11-21T19:47:13Z

Adding a couple of fields to the boto3 user agent will allow the AWS team to better understand which services and operators to focus improvements on in the future. This is similar to the user agent fields added by Databricks, Google, Yandex, and others.

ferruzzi · 2022-11-21T22:12:52Z

Two failing tests are also failing in main.

airflow/providers/amazon/aws/hooks/base_aws.py

Taragolis · 2022-11-22T16:15:51Z

This is my personal thoughts

Someone of end users might be unhappy the fact that we actually started collect additional telemetry like.
dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. [aws] extra_botocore_user_agent_callable with full qualified path to callable e.g. some_vendor.safe.get_user_agent

I thought we also might add/merge config in AwsCredentialHelper rather than in hook

airflow/airflow/providers/amazon/aws/utils/connection_wrapper.py

Lines 229 to 235 in 093345c

    
           config_kwargs = extra.get("config_kwargs") 
        
           if not self.botocore_config and config_kwargs: 
        
               # https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html 
        
               self.log.debug("Retrieving botocore config=%s from %s extra.", config_kwargs, self.conn_repr) 
        
               self.botocore_config = Config(**config_kwargs)

ferruzzi · 2022-11-22T17:50:28Z

Someone of end users might be unhappy the fact that we actually started collect additional telemetry like.
dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. [aws] extra_botocore_user_agent_callable with full qualified path to callable e.g. some_vendor.safe.get_user_agent

Moved to a code comment so it can be a threaded conversation; see below

airflow/providers/amazon/aws/hooks/base_aws.py

ferruzzi · 2022-11-29T18:18:37Z

Any other thoughts on this?

ferruzzi · 2022-11-30T18:56:47Z

@uranusjr @eladkal Any other thoughts or suggestions on this one? I'm going to be going to be away for vacation for the holidays in a couple of weeks and would love to get this sorted out before I go.

uranusjr

Pending the discussion on telemetry

potiuk

Some static checks need fixing, but otherwise it looks good.

ferruzzi · 2022-12-03T07:03:17Z

CI build timed out. Going to bump it.

ferruzzi · 2022-12-03T07:40:35Z

Two tests are failing in providers/amazon/aws/hooks/test_s3.py but they are also failing in main on my side and shouldn't be related to any changes I've made.

potiuk · 2022-12-03T09:28:04Z

Two tests are failing in providers/amazon/aws/hooks/test_s3.py but they are also failing in main on my side and shouldn't be related to any changes I've made.

They do not seem to fail on "canary builds" in CI: for main, so there must be something in your changes that cause it https://github.com/apache/airflow/actions/runs/3607606417

ferruzzi · 2022-12-03T18:49:15Z

so there must be something in your changes that cause it

I don't see how that's possible. If I check out main, git pull --rebase, and run the tests, they fail. My code is all in a different branch.

(airflow-env) ferruzzi:~/workplace/airflow (ferruzzi/boto-user-agent)
$ git checkout main
Switched to branch 'main'
Your branch is up to date with 'apache/main'.
(airflow-env) ferruzzi:~/workplace/airflow (main)
$ git pull --rebase
Already up to date.
Current branch main is up to date.
(airflow-env) ferruzzi:~/workplace/airflow (main)
$ pytest tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id -W ignore::DeprecationWarning -W  ignore::FutureWarning
=========================================================================== test session starts ===========================================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/ANT.AMAZON.COM/ferruzzi/.pyenv/versions/airflow-env/bin/python
cachedir: .pytest_cache
rootdir: /home/ANT.AMAZON.COM/ferruzzi/workplace/airflow, configfile: pytest.ini
plugins: anyio-3.6.2, xdist-2.5.0, forked-1.4.0, requests-mock-1.9.3, flaky-3.7.0, instafail-0.4.2, rerunfailures-9.1.1, cov-3.0.0, timeouts-1.2.1, httpx-0.15.0, asyncio-0.16.0
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 1 item                                                                                                                                                          

tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id FAILED                                    [100%]

================================================================================ FAILURES =================================================================================
_______________________________________________ TestAwsS3HookNoMock.test_check_for_bucket_raises_error_with_invalid_conn_id _______________________________________________

self = <tests.providers.amazon.aws.hooks.test_s3.TestAwsS3HookNoMock object at 0x7f7d4a39c610>, monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f7d4a39ca30>

    def test_check_for_bucket_raises_error_with_invalid_conn_id(self, monkeypatch):
        monkeypatch.delenv("AWS_PROFILE", raising=False)
        monkeypatch.delenv("AWS_ACCESS_KEY_ID", raising=False)
        monkeypatch.delenv("AWS_SECRET_ACCESS_KEY", raising=False)
        hook = S3Hook(aws_conn_id="does_not_exist")
        # We're mocking all actual AWS calls and don't need a connection. This
        # avoids an Airflow warning about connection cannot be found.
        hook.get_connection = lambda _: None
        with pytest.raises(NoCredentialsError):
>           hook.check_for_bucket("test-non-existing-bucket")
E           Failed: DID NOT RAISE <class 'botocore.exceptions.NoCredentialsError'>

tests/providers/amazon/aws/hooks/test_s3.py:63: Failed
-------------------------------------------------------------------------- Captured stdout setup --------------------------------------------------------------------------
========================= AIRFLOW ==========================
Home of the user: /home/ANT.AMAZON.COM/ferruzzi
Airflow home /home/ANT.AMAZON.COM/ferruzzi/airflow
Skipping initializing of the DB as it was initialized already.
You can re-initialize the database by adding --with-db-init flag when running tests.
-------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
[2022-12-03 10:47:34,076] {base_aws.py:121} INFO - No connection ID provided. Fallback on boto3 credential strategy (region_name=None). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
[2022-12-03 10:47:34,087] {credentials.py:1251} INFO - Found credentials in shared credentials file: ~/.aws/credentials
[2022-12-03 10:47:34,593] {s3.py:221} INFO - Bucket "test-non-existing-bucket" does not exist
---------------------------------------------------------------------------- Captured log call ----------------------------------------------------------------------------
INFO     airflow.providers.amazon.aws.hooks.base_aws.BaseSessionFactory:base_aws.py:121 No connection ID provided. Fallback on boto3 credential strategy (region_name=None). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
INFO     botocore.credentials:credentials.py:1251 Found credentials in shared credentials file: ~/.aws/credentials
INFO     airflow.providers.amazon.aws.hooks.s3.S3Hook:s3.py:221 Bucket "test-non-existing-bucket" does not exist
============================================================================ warnings summary =============================================================================
../../.pyenv/versions/airflow-env/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
  /home/ANT.AMAZON.COM/ferruzzi/.pyenv/versions/airflow-env/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: asyncio_mode
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================================================================= short test summary info =========================================================================
FAILED tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id - Failed: DID NOT RAISE <class 'botocor...
====================================================================== 1 failed, 1 warning in 1.73s =======================================================================
(airflow-env) ferruzzi:~/workplace/airflow (main)
$

potiuk · 2022-12-03T19:28:00Z

I am afraid you are one of the few unlucky ones who will have to deal with side-effects.

It is very likely that those tests that fail, rely on side effects from other tests. It's also possible that your tests are clearing/removing those side effects - that's why those tests start to fail in your PR.

Did you run all tessts from providers?

breeze testing tests --test-type providers[amazon]

Those should run full "amazon providers" suite in the same sequence they are run in CI - run them on main and see they don't fail.

If you run those tests individually and they fail - it means they are relying on side effects from other tests. This is because most of our unit tests rely on dags/dagruns/connections etc. created in the unit test DB and the DB is re-used across all the tests.

The only way to solve it for now is to investigate and fix those tests that indeed rely on side effects.

That might involve writing a fixture or setUp method that restores the expected state of the DB records that the test expects - many tests have those, but some of them not and rely on the DB being populated from previous tests - and this is the problem.

Yes. It's not your fault. But also yes - it, unfortunately, falls on your shoulders essentially if your newly added tests or the modified ones interfere with it.

Ideally by finding and resolving the side-effects in the other tests so that the situation gets improved for the future.

It happens rarely enough that it is not a "common" problem, but unfortunately, you fell a victim of it likely.

And yes it is NOT how it should be. But - unfortunately, we are not living in perfect world and it is, what it is. And hopefully one day we will be able to solve it better and get rid even of the possibility of the side effects to happen, but that would likely require a lot of investment into complete refactoring on how 100s if not 1000s of tests are implemented, so it's totally not feasible to solve it now once and for all, I am afraid.

We can complain about it (I do) but this is - unfortuntely - quite an effort to fix. And if anyone has an idea how to solve it quickly - that would be fantastic We have tried in the past to just clean the db for every unit tests but with many thousands of them this slows down the whole suite to a crawl.

Hopefully some day we will figure how to fix it better. But I am afraid quoting ~~King Theoden~~ Aragorn- "this is not that day".

BTW. This is actually one af a good ideas on how to make an overall improvement in our tests suite to fix this problem permanently.

Taragolis · 2022-12-03T20:08:09Z

@ferruzzi tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id test fail locally because you have credentials in shared credentials file in default profile.
Actually we need to move this test into AwsBaseHook, if we actually need this test. If I understand correctly we try to test that boto3/botocore will raise an error if no credentials provided.

TestAwsS3Hook.test_create_bucket_no_region_regional_endpoint: in this test we check that if user want to use regional s3 endpoint for us-east-1 but do not specify any region than bucket creation will failed (Airflow prevent this) because in this case region would set to aws-global

It might happen by two things:

Some changes happen in recent version of botocore and this settings ignored or do not return aws-global as region.
The settings {"config_kwargs": {"s3": {"us_east_1_regional_endpoint": "regional"}}} somehow overwrites after changes in this PR. So other config_kwargs might overwrites, e.g. retry strategy.

potiuk · 2022-12-03T20:40:15Z

@ferruzzi tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id test fail locally because you have credentials in shared credentials file in default profile. Actually we need to move this test into AwsBaseHook, if we actually need this test. If I understand correctly we try to test that boto3/botocore will raise an error if no credentials provided.

TestAwsS3Hook.test_create_bucket_no_region_regional_endpoint: in this test we check that if user want to use regional s3 endpoint for us-east-1 but do not specify any region than bucket creation will failed (Airflow prevent this) because in this case region would set to aws-global

It might happen by two things:

Some changes happen in recent version of botocore and this settings ignored or do not return aws-global as region.

The settings {"config_kwargs": {"s3": {"us_east_1_regional_endpoint": "regional"}}} somehow overwrites after changes in this PR. So other config_kwargs might overwrites, e.g. retry strategy.

I think in this case the side-effect is different than DB:

INFO     botocore.credentials:credentials.py:1251 Found credentials in shared credentials file: ~/.aws/credentials

Looks like the tests you added are creating (and not deleting) ~/.aws/credentials - and the failing test expected to raise "No Credentials" error.

If my hypothesis is right, the fix is two fold:

make sure your new tests clean-up the ~/.aws/credentials in tearDown/fixture
make sure to cleanup ~/.aws/credentials in setup before the test_check_for_bucket_raises_error_with_invalid_conn_id test.

On one hand the test did not have the proper "Setup" clearing the state before the test to one that was expected from the test, but on the other hand - it would be difficult to foresee when the test was being written that someone else will create and leave such credential file. Hard to blame anyone in particular, other than it's totally not feasible to run thousands of tests each in completely isolated and pristine environment without side-effects like that.

This would be another side-effect example that you might stumble upon and it is something that will happen occasionally unfortunately. Can't imagine how to prevent those kind of problems.

ferruzzi · 2022-12-05T22:43:05Z

Did you run all tessts from providers?
breeze testing tests --test-type providers[amazon]
Those should run full "amazon providers" suite in the same sequence they are run in CI - run them on main and see they don't fail.

But they do, that's what I was saying. There are two tests in hooks/s3.py that fail when I run them locally even when I run them using breeze from main without my code. I can work on figuring out why and fixing them, but if they fail in breeze in main then I don't see how the code in this PR is the issue. That should be a separate PR to fix them.

FAILED tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id

(Note, this one is no longer failing in the CI and I have not caught up on my messages from yesterday or this morning, this may already be fixed)

/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id test fail locally because you have credentials in shared credentials file in default profile.

Yup, that looks right. I propose that if that test expects to not find one, then it should mock the check to see if there is one and return false. Does that sound like the fight fix to that one to you two? If we like that solution I can make the fix but I think it belongs in a different PR.

FAILED tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3Hook::test_create_bucket_no_region_regional_endpoint

When run directly (not in breeze) locally it returns an exception that the bucvket already exists, when run in breeze it fails witht he message botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the CreateBucket operation: The AWS Access Key Id you provided does not exist in our records. I want to say that sounds like it is not mocking the S3 connection and is actually hitting the live S3 API.

potiuk · 2022-12-05T22:55:13Z

Hey @ferruzzi - please rebase now. #28129 that @Taragolis implemented added extra protection for your local env files and moved the test elsewhere.

the problem was real, but you likely have not seen it locally, becuse you have AWS environment variables in your breeze configuration

- Ensure all helper methods have a full `except Exception` block with reasonable default values - Moved possible circular import into a helper within a try/except block

ferruzzi · 2022-12-05T23:20:18Z

Hey @ferruzzi - please rebase now

Pushing now.

ferruzzi · 2022-12-06T00:21:45Z

Looks like that commit from @Taragolis fixed one of them. Looking at the other one, it's throwing botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the CreateBucket operation: The AWS Access Key Id you provided does not exist in our records.which sounds like it's not mocking the connection to S3. I'll poke at it and see what I can figure out.

When the test calls hook.create_bucket("unable-to-create") on L173, hook.conn.aws_access_key_id is None. @Taragolis , youve been playing in the dark inner workings of the aws connection a lot lately, perhaps I misunderstood how Config.merge() works and I'm breaking it at L560 here??

State of the config object at the API call:

With all those "None" values, I rather think it looks like a mocking issue to me, and we need to set some return value that's been missed, but maybe I'm wrong.

[EDIT] I dropped some debugger traces in there and the config merge looks like it is working as expected; pretty sure it isn't that. I'll have to look more tomorrow.

[EDIT 2] Figured out the difference, hope to have a fix up shortly. When run in main, the test calls the API using conn_region_name="aws-global", but when run in my branch it's calling it with region "us-east-1".

ferruzzi · 2022-12-08T08:20:43Z

Sorry for the holdup, this should pass now. All tests are passing on my side now in Breeze.

Here's what I came up with: on L414 here I was defaulting to an empty Config() thinking that would be 'falsy' here in the Connection wrapper, but it isn't 'falsy' so it overrides the user-provided values if the user creates a Connection. So I moved the default to the BaseAws config property and removed the option for that to be None.

Taragolis · 2022-12-08T09:42:08Z

airflow/providers/amazon/aws/hooks/base_aws.py

+        except Exception:
+            # Under no condition should an error here ever cause an issue for the user.
+            return "00000000-0000-5000-0000-000000000000"


Small nitpick

Seems like there is only KeyError might happen here

Some fixes in nil-uuid

Suggested change

except Exception:

# Under no condition should an error here ever cause an issue for the user.

return "00000000-0000-5000-0000-000000000000"

except KeyError:

# Under no condition should an error here ever cause an issue for the user.

return "00000000-0000-0000-0000-000000000000"

The 5 is a fixed bit in UUID to show the version and the unit tests check for it to confirm the format so that either has to stay or the unit tests need to be changed. I'd like to keep it, but if you have a strong opinion here or a reason it should be changed I'll update the unit tests too.

For the exception, yeah you are right but I did it for consistency since the others all go by the policy that "nothing should possibly bubble up". I can change it to IndexError if you still want, I just figured I would explain my reason.

Nil UUID is special form of UUID, it is not include any specific data, see: https://datatracker.ietf.org/doc/html/rfc4122.html#section-4-1-7

So it is good point to check if any exception happen then all bits are 0

from uuid import UUID nil = "00000000-0000-0000-0000-000000000000" assert UUID(int=0) == UUID(nil)

And actually I thought we do not need any regex. We could just pass value to UUID object and compare with version

from uuid import UUID random = "80eb85a0-5aef-45b6-abbb-f16d62d3db42" uuid_v5 = "bf428e1d-f221-55de-a77f-a61755a4d727" nil = "00000000-0000-0000-0000-000000000000" assert UUID(random).version == 4 assert UUID(uuid_v5).version == 5 assert UUID(nil).version is None

And nil uuid not possible to get by use uuid5 however in theory (and infinity time) it is possible to get 00000000-0000-5000-0000-000000000000

I'm just running the static checks locally and I'll push the no-regex version. I actually didn't know about the UUID().version check, that's very handy. 👍

I didn't know that one of the bit is actually a version until you told that ¯\_(ツ)_/¯ 👍

Alright, I made a small tweak and rerunning the tests locally, but what I ended up with is assert UUID(dag_run_key).version in {5, None}. UUID().version also verifies that it is a valid format so that will catch a poorly formed UUID or anything thast is a valid UUID but not v5 or nil

and then realized that without mocking the environment variable that generated the UUID, it's always returning the exception case so I've parameterized the test so it's testing both cases. Tests should be done in a sec.

ferruzzi · 2022-12-08T18:54:46Z

@Taragolis and @potiuk Thank you both for your help on this one. 👍

ferruzzi requested a review from eladkal as a code owner November 21, 2022 19:47

boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Nov 21, 2022

uranusjr reviewed Nov 21, 2022

View reviewed changes