Skip to content

Conversation

@ferruzzi
Copy link
Contributor

Adding a couple of fields to the boto3 user agent will allow the AWS team to better understand which services and operators to focus improvements on in the future. This is similar to the user agent fields added by Databricks, Google, Yandex, and others.

@ferruzzi ferruzzi requested a review from eladkal as a code owner November 21, 2022 19:47
@boring-cyborg boring-cyborg bot added area:providers provider:amazon AWS/Amazon - related issues labels Nov 21, 2022
@ferruzzi
Copy link
Contributor Author

ferruzzi commented Nov 21, 2022

Two failing tests are also failing in main.

@Taragolis
Copy link
Contributor

This is my personal thoughts

  1. Someone of end users might be unhappy the fact that we actually started collect additional telemetry like.
    dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. [aws] extra_botocore_user_agent_callable with full qualified path to callable e.g. some_vendor.safe.get_user_agent

  2. I thought we also might add/merge config in AwsCredentialHelper rather than in hook

    config_kwargs = extra.get("config_kwargs")
    if not self.botocore_config and config_kwargs:
    # https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html
    self.log.debug("Retrieving botocore config=%s from %s extra.", config_kwargs, self.conn_repr)
    self.botocore_config = Config(**config_kwargs)

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Nov 22, 2022

Someone of end users might be unhappy the fact that we actually started collect additional telemetry like.
dag_id and ClassName even if it only hashes. So might be better collect by default only Airflow and Amazon Provider version and if required additional metadata then call optional callable which could controlled by config e.g. [aws] extra_botocore_user_agent_callable with full qualified path to callable e.g. some_vendor.safe.get_user_agent

Moved to a code comment so it can be a threaded conversation; see below

@ferruzzi ferruzzi force-pushed the ferruzzi/boto-user-agent branch from 9ff7e6b to a5fc3bc Compare November 24, 2022 21:39
@ferruzzi
Copy link
Contributor Author

Any other thoughts on this?

@ferruzzi
Copy link
Contributor Author

@uranusjr @eladkal Any other thoughts or suggestions on this one? I'm going to be going to be away for vacation for the holidays in a couple of weeks and would love to get this sorted out before I go.

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending the discussion on telemetry

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some static checks need fixing, but otherwise it looks good.

@ferruzzi ferruzzi force-pushed the ferruzzi/boto-user-agent branch from a5fc3bc to a5f4de0 Compare December 3, 2022 03:56
@ferruzzi
Copy link
Contributor Author

ferruzzi commented Dec 3, 2022

CI build timed out. Going to bump it.

@ferruzzi ferruzzi closed this Dec 3, 2022
@ferruzzi ferruzzi reopened this Dec 3, 2022
@ferruzzi
Copy link
Contributor Author

ferruzzi commented Dec 3, 2022

Two tests are failing in providers/amazon/aws/hooks/test_s3.py but they are also failing in main on my side and shouldn't be related to any changes I've made.

@potiuk
Copy link
Member

potiuk commented Dec 3, 2022

Two tests are failing in providers/amazon/aws/hooks/test_s3.py but they are also failing in main on my side and shouldn't be related to any changes I've made.

They do not seem to fail on "canary builds" in CI: for main, so there must be something in your changes that cause it https://github.com/apache/airflow/actions/runs/3607606417

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Dec 3, 2022

so there must be something in your changes that cause it

I don't see how that's possible. If I check out main, git pull --rebase, and run the tests, they fail. My code is all in a different branch.

(airflow-env) ferruzzi:~/workplace/airflow (ferruzzi/boto-user-agent)
$ git checkout main
Switched to branch 'main'
Your branch is up to date with 'apache/main'.
(airflow-env) ferruzzi:~/workplace/airflow (main)
$ git pull --rebase
Already up to date.
Current branch main is up to date.
(airflow-env) ferruzzi:~/workplace/airflow (main)
$ pytest tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id -W ignore::DeprecationWarning -W  ignore::FutureWarning
=========================================================================== test session starts ===========================================================================
platform linux -- Python 3.8.10, pytest-6.2.5, py-1.11.0, pluggy-1.0.0 -- /home/ANT.AMAZON.COM/ferruzzi/.pyenv/versions/airflow-env/bin/python
cachedir: .pytest_cache
rootdir: /home/ANT.AMAZON.COM/ferruzzi/workplace/airflow, configfile: pytest.ini
plugins: anyio-3.6.2, xdist-2.5.0, forked-1.4.0, requests-mock-1.9.3, flaky-3.7.0, instafail-0.4.2, rerunfailures-9.1.1, cov-3.0.0, timeouts-1.2.1, httpx-0.15.0, asyncio-0.16.0
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 1 item                                                                                                                                                          

tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id FAILED                                    [100%]

================================================================================ FAILURES =================================================================================
_______________________________________________ TestAwsS3HookNoMock.test_check_for_bucket_raises_error_with_invalid_conn_id _______________________________________________

self = <tests.providers.amazon.aws.hooks.test_s3.TestAwsS3HookNoMock object at 0x7f7d4a39c610>, monkeypatch = <_pytest.monkeypatch.MonkeyPatch object at 0x7f7d4a39ca30>

    def test_check_for_bucket_raises_error_with_invalid_conn_id(self, monkeypatch):
        monkeypatch.delenv("AWS_PROFILE", raising=False)
        monkeypatch.delenv("AWS_ACCESS_KEY_ID", raising=False)
        monkeypatch.delenv("AWS_SECRET_ACCESS_KEY", raising=False)
        hook = S3Hook(aws_conn_id="does_not_exist")
        # We're mocking all actual AWS calls and don't need a connection. This
        # avoids an Airflow warning about connection cannot be found.
        hook.get_connection = lambda _: None
        with pytest.raises(NoCredentialsError):
>           hook.check_for_bucket("test-non-existing-bucket")
E           Failed: DID NOT RAISE <class 'botocore.exceptions.NoCredentialsError'>

tests/providers/amazon/aws/hooks/test_s3.py:63: Failed
-------------------------------------------------------------------------- Captured stdout setup --------------------------------------------------------------------------
========================= AIRFLOW ==========================
Home of the user: /home/ANT.AMAZON.COM/ferruzzi
Airflow home /home/ANT.AMAZON.COM/ferruzzi/airflow
Skipping initializing of the DB as it was initialized already.
You can re-initialize the database by adding --with-db-init flag when running tests.
-------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
[2022-12-03 10:47:34,076] {base_aws.py:121} INFO - No connection ID provided. Fallback on boto3 credential strategy (region_name=None). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
[2022-12-03 10:47:34,087] {credentials.py:1251} INFO - Found credentials in shared credentials file: ~/.aws/credentials
[2022-12-03 10:47:34,593] {s3.py:221} INFO - Bucket "test-non-existing-bucket" does not exist
---------------------------------------------------------------------------- Captured log call ----------------------------------------------------------------------------
INFO     airflow.providers.amazon.aws.hooks.base_aws.BaseSessionFactory:base_aws.py:121 No connection ID provided. Fallback on boto3 credential strategy (region_name=None). See: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
INFO     botocore.credentials:credentials.py:1251 Found credentials in shared credentials file: ~/.aws/credentials
INFO     airflow.providers.amazon.aws.hooks.s3.S3Hook:s3.py:221 Bucket "test-non-existing-bucket" does not exist
============================================================================ warnings summary =============================================================================
../../.pyenv/versions/airflow-env/lib/python3.8/site-packages/_pytest/config/__init__.py:1233
  /home/ANT.AMAZON.COM/ferruzzi/.pyenv/versions/airflow-env/lib/python3.8/site-packages/_pytest/config/__init__.py:1233: PytestConfigWarning: Unknown config option: asyncio_mode
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================================================================= short test summary info =========================================================================
FAILED tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id - Failed: DID NOT RAISE <class 'botocor...
====================================================================== 1 failed, 1 warning in 1.73s =======================================================================
(airflow-env) ferruzzi:~/workplace/airflow (main)
$ 

@potiuk
Copy link
Member

potiuk commented Dec 3, 2022

I am afraid you are one of the few unlucky ones who will have to deal with side-effects.

It is very likely that those tests that fail, rely on side effects from other tests. It's also possible that your tests are clearing/removing those side effects - that's why those tests start to fail in your PR.

Did you run all tessts from providers?

breeze testing tests --test-type providers[amazon]

Those should run full "amazon providers" suite in the same sequence they are run in CI - run them on main and see they don't fail.

If you run those tests individually and they fail - it means they are relying on side effects from other tests. This is because most of our unit tests rely on dags/dagruns/connections etc. created in the unit test DB and the DB is re-used across all the tests.

The only way to solve it for now is to investigate and fix those tests that indeed rely on side effects.

That might involve writing a fixture or setUp method that restores the expected state of the DB records that the test expects - many tests have those, but some of them not and rely on the DB being populated from previous tests - and this is the problem.

Yes. It's not your fault. But also yes - it, unfortunately, falls on your shoulders essentially if your newly added tests or the modified ones interfere with it.

Ideally by finding and resolving the side-effects in the other tests so that the situation gets improved for the future.

It happens rarely enough that it is not a "common" problem, but unfortunately, you fell a victim of it likely.

And yes it is NOT how it should be. But - unfortunately, we are not living in perfect world and it is, what it is. And hopefully one day we will be able to solve it better and get rid even of the possibility of the side effects to happen, but that would likely require a lot of investment into complete refactoring on how 100s if not 1000s of tests are implemented, so it's totally not feasible to solve it now once and for all, I am afraid.

We can complain about it (I do) but this is - unfortuntely - quite an effort to fix. And if anyone has an idea how to solve it quickly - that would be fantastic We have tried in the past to just clean the db for every unit tests but with many thousands of them this slows down the whole suite to a crawl.

Hopefully some day we will figure how to fix it better. But I am afraid quoting King Theoden Aragorn- "this is not that day".

BTW. This is actually one af a good ideas on how to make an overall improvement in our tests suite to fix this problem permanently.

@Taragolis
Copy link
Contributor

@ferruzzi tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id test fail locally because you have credentials in shared credentials file in default profile.
Actually we need to move this test into AwsBaseHook, if we actually need this test. If I understand correctly we try to test that boto3/botocore will raise an error if no credentials provided.

TestAwsS3Hook.test_create_bucket_no_region_regional_endpoint: in this test we check that if user want to use regional s3 endpoint for us-east-1 but do not specify any region than bucket creation will failed (Airflow prevent this) because in this case region would set to aws-global

It might happen by two things:

  1. Some changes happen in recent version of botocore and this settings ignored or do not return aws-global as region.
  2. The settings {"config_kwargs": {"s3": {"us_east_1_regional_endpoint": "regional"}}} somehow overwrites after changes in this PR. So other config_kwargs might overwrites, e.g. retry strategy.

@potiuk
Copy link
Member

potiuk commented Dec 3, 2022

@ferruzzi tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id test fail locally because you have credentials in shared credentials file in default profile. Actually we need to move this test into AwsBaseHook, if we actually need this test. If I understand correctly we try to test that boto3/botocore will raise an error if no credentials provided.

TestAwsS3Hook.test_create_bucket_no_region_regional_endpoint: in this test we check that if user want to use regional s3 endpoint for us-east-1 but do not specify any region than bucket creation will failed (Airflow prevent this) because in this case region would set to aws-global

It might happen by two things:

  1. Some changes happen in recent version of botocore and this settings ignored or do not return aws-global as region.
  2. The settings {"config_kwargs": {"s3": {"us_east_1_regional_endpoint": "regional"}}} somehow overwrites after changes in this PR. So other config_kwargs might overwrites, e.g. retry strategy.

I think in this case the side-effect is different than DB:

INFO     botocore.credentials:credentials.py:1251 Found credentials in shared credentials file: ~/.aws/credentials

Looks like the tests you added are creating (and not deleting) ~/.aws/credentials - and the failing test expected to raise "No Credentials" error.

If my hypothesis is right, the fix is two fold:

  • make sure your new tests clean-up the ~/.aws/credentials in tearDown/fixture
  • make sure to cleanup ~/.aws/credentials in setup before the test_check_for_bucket_raises_error_with_invalid_conn_id test.

On one hand the test did not have the proper "Setup" clearing the state before the test to one that was expected from the test, but on the other hand - it would be difficult to foresee when the test was being written that someone else will create and leave such credential file. Hard to blame anyone in particular, other than it's totally not feasible to run thousands of tests each in completely isolated and pristine environment without side-effects like that.

This would be another side-effect example that you might stumble upon and it is something that will happen occasionally unfortunately. Can't imagine how to prevent those kind of problems.

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Dec 5, 2022

Did you run all tessts from providers?
breeze testing tests --test-type providers[amazon]
Those should run full "amazon providers" suite in the same sequence they are run in CI - run them on main and see they don't fail.

But they do, that's what I was saying. There are two tests in hooks/s3.py that fail when I run them locally even when I run them using breeze from main without my code. I can work on figuring out why and fixing them, but if they fail in breeze in main then I don't see how the code in this PR is the issue. That should be a separate PR to fix them.

FAILED tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id

(Note, this one is no longer failing in the CI and I have not caught up on my messages from yesterday or this morning, this may already be fixed)

/test_s3.py::TestAwsS3HookNoMock::test_check_for_bucket_raises_error_with_invalid_conn_id test fail locally because you have credentials in shared credentials file in default profile.

Yup, that looks right. I propose that if that test expects to not find one, then it should mock the check to see if there is one and return false. Does that sound like the fight fix to that one to you two? If we like that solution I can make the fix but I think it belongs in a different PR.

FAILED tests/providers/amazon/aws/hooks/test_s3.py::TestAwsS3Hook::test_create_bucket_no_region_regional_endpoint

When run directly (not in breeze) locally it returns an exception that the bucvket already exists, when run in breeze it fails witht he message botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the CreateBucket operation: The AWS Access Key Id you provided does not exist in our records. I want to say that sounds like it is not mocking the S3 connection and is actually hitting the live S3 API.

@potiuk
Copy link
Member

potiuk commented Dec 5, 2022

Hey @ferruzzi - please rebase now. #28129 that @Taragolis implemented added extra protection for your local env files and moved the test elsewhere.

the problem was real, but you likely have not seen it locally, becuse you have AWS environment variables in your breeze configuration

- Ensure all helper methods have a full `except Exception` block with reasonable default values
- Moved possible circular import into a helper within a try/except block
@ferruzzi
Copy link
Contributor Author

ferruzzi commented Dec 5, 2022

Hey @ferruzzi - please rebase now

Pushing now.

@ferruzzi ferruzzi force-pushed the ferruzzi/boto-user-agent branch from a5f4de0 to 98ceb88 Compare December 5, 2022 23:20
@ferruzzi
Copy link
Contributor Author

ferruzzi commented Dec 6, 2022

Looks like that commit from @Taragolis fixed one of them. Looking at the other one, it's throwing botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the CreateBucket operation: The AWS Access Key Id you provided does not exist in our records.which sounds like it's not mocking the connection to S3. I'll poke at it and see what I can figure out.

When the test calls hook.create_bucket("unable-to-create") on L173, hook.conn.aws_access_key_id is None. @Taragolis , youve been playing in the dark inner workings of the aws connection a lot lately, perhaps I misunderstood how Config.merge() works and I'm breaking it at L560 here??

State of the config object at the API call:
image

With all those "None" values, I rather think it looks like a mocking issue to me, and we need to set some return value that's been missed, but maybe I'm wrong.

[EDIT] I dropped some debugger traces in there and the config merge looks like it is working as expected; pretty sure it isn't that. I'll have to look more tomorrow.

[EDIT 2] Figured out the difference, hope to have a fix up shortly. When run in main, the test calls the API using conn_region_name="aws-global", but when run in my branch it's calling it with region "us-east-1".

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Dec 8, 2022

Sorry for the holdup, this should pass now. All tests are passing on my side now in Breeze.

Here's what I came up with: on L414 here I was defaulting to an empty Config() thinking that would be 'falsy' here in the Connection wrapper, but it isn't 'falsy' so it overrides the user-provided values if the user creates a Connection. So I moved the default to the BaseAws config property and removed the option for that to be None.

Comment on lines 476 to 478
except Exception:
# Under no condition should an error here ever cause an issue for the user.
return "00000000-0000-5000-0000-000000000000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick

  1. Seems like there is only KeyError might happen here
  2. Some fixes in nil-uuid
Suggested change
except Exception:
# Under no condition should an error here ever cause an issue for the user.
return "00000000-0000-5000-0000-000000000000"
except KeyError:
# Under no condition should an error here ever cause an issue for the user.
return "00000000-0000-0000-0000-000000000000"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 5 is a fixed bit in UUID to show the version and the unit tests check for it to confirm the format so that either has to stay or the unit tests need to be changed. I'd like to keep it, but if you have a strong opinion here or a reason it should be changed I'll update the unit tests too.

For the exception, yeah you are right but I did it for consistency since the others all go by the policy that "nothing should possibly bubble up". I can change it to IndexError if you still want, I just figured I would explain my reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nil UUID is special form of UUID, it is not include any specific data, see: https://datatracker.ietf.org/doc/html/rfc4122.html#section-4-1-7

So it is good point to check if any exception happen then all bits are 0

from uuid import UUID

nil = "00000000-0000-0000-0000-000000000000"
assert UUID(int=0) == UUID(nil)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually I thought we do not need any regex. We could just pass value to UUID object and compare with version

from uuid import UUID

random = "80eb85a0-5aef-45b6-abbb-f16d62d3db42"
uuid_v5 = "bf428e1d-f221-55de-a77f-a61755a4d727"
nil = "00000000-0000-0000-0000-000000000000"

assert UUID(random).version == 4
assert UUID(uuid_v5).version == 5
assert UUID(nil).version is None

And nil uuid not possible to get by use uuid5 however in theory (and infinity time) it is possible to get 00000000-0000-5000-0000-000000000000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just running the static checks locally and I'll push the no-regex version. I actually didn't know about the UUID().version check, that's very handy. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that one of the bit is actually a version until you told that ¯\_(ツ)_/¯ 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I made a small tweak and rerunning the tests locally, but what I ended up with is assert UUID(dag_run_key).version in {5, None}. UUID().version also verifies that it is a valid format so that will catch a poorly formed UUID or anything thast is a valid UUID but not v5 or nil

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and then realized that without mocking the environment variable that generated the UUID, it's always returning the exception case so I've parameterized the test so it's testing both cases. Tests should be done in a sec.

@potiuk potiuk merged commit a6315c2 into apache:main Dec 8, 2022
@ferruzzi
Copy link
Contributor Author

ferruzzi commented Dec 8, 2022

@Taragolis and @potiuk Thank you both for your help on this one. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:amazon AWS/Amazon - related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants