Skip to content

Conversation

@Taragolis
Copy link
Contributor

This is Proof of concept for add ability provide additional credentials method to DockerHook and related operators/decorators. Some kind of continuation of #26162

ToDo:

  • Update documentation

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member

potiuk commented Aug 24, 2023

THAT one looks pretty complete not POC :D . I did not have time to look at details, but sounds cool and likely @o-nikolas @vandonr-amz and team should take a look :)

@Taragolis Taragolis force-pushed the docker-registry-credentials branch from 5a225c9 to 2939b43 Compare August 28, 2023 16:36
@Taragolis
Copy link
Contributor Author

THAT one looks pretty complete not POC :D

Well for me this implementation is still between some dirty hack and something missing in Connection+Hook integration.
This PR sits for a long time in my local brach because I can't find a better way to implements such integration, other ideas was

(Option 1) Just create inheritance of existed DockerHook, DockerOperator, docker decorator, DockerSwarmOperator

Required support all of them independently, update decorator and need to track is change in parent classes break child classes.

(Option 2) Create something like DbApiHook but for docker

I've also thinking about it, bet the problem that DockerHook and operators, only use connection for login to Docker Registry, and have pretty stable specification for what it required

(Option 3) Change in Airflow Connection + Airflow Hooks at all

Too many efforts required, and most of them break something in Airflow 2.x, so it should come thought discussion first


So this implementation basically is a middle of Option 2 and Option 3, I hope that at least it better than Airflow ECR Plugin from Airflow Ecosystem Page

@potiuk
Copy link
Member

potiuk commented Aug 29, 2023

I think trying to come up with anything "common" for all "docker-like" potential operators is unnecessary effort. Like all attemps to make things DRY and reused, it also introduces coupling (and that's really what you refer as "needs changes for Airflow 2 and back-compatibility)..

This is a "middle" solution as you say - it does not integrate with Connection as deeply as it could but IMHO it's good. It's ok the implementaiton of protocol is tied to "docker-only", and it's ok that particular implementation (AWS in this case) maps - in the code - the generic "airlfow connection" credentials to those of your DockerAuth. This is a very elagant solution where a small piece of "Service-specific code" is used to join the two sides.

Airflow is all about "platform as a code" and we should do more of it where we take the "Airflow realm" (of hooks, connections etc) and write small-ish snippets of code where we make it works with "other realms" (such as docker authentication). Trying to merge them is not really efficient, because we have many such worlds to connect - so in a way what have in Connections/Hook is the Least Common Denominator - not perfect and not super flexible, but at the same time you can easily write a piece of code like yours that makes the information stored in Connections usable to build more complex authentication/refreshign etc . schemes (like what you do).

@Taragolis
Copy link
Contributor Author

Taragolis commented Aug 29, 2023

I'm still not sure that hook should be a part which are responsible for connection, but we have what we have. If go thought community providers we easily could find than many of them implements it's own unique method with Connection integration and resolving merging parameters.

Let's me show how I would see some simple integration, where parameters for integration do not lost between operator and hook, pretty well documented, could be validated before use (now we validate only in the UI) and also could use as part of build UI and auto-documentation.

class AwesomeConnection(BaseModel):
    """Awesome Connection"""

    class Config:
        ...

    connection_id: str = Field(
        title="Name",
        description="The name value of insight",
    )

    login: str = Field(
        title="Awesome Login",
        description="Description to Awesome ID",
        airflow_connection_only=True,
    )

    password: str = Field(
        title="Awesome Password",
        description=(
            "Description to Awesome Password, e.g. "
            "where it could be obtained in case cloud services"
        ),
        airflow_sensetive=True,
        airflow_connection_only=True,
    )

    hostname: str = Field(
        title="Awesome Hostname",
        description="Foo bar spam egg",
    )

    code_only_parameter: str = Field(
        title="Awesome Parameter",
        description="Foo bar spam egg",
        airflow_user_code_only=True,
    )


class AwesomeHook:
    def __init__(self, *, connection: AwesomeConnection):
        self._connection = connection

    @cached_property
    def client(self):
        resolved_connection_info = awesome_common_merger(self._connection)
        # or
        resolved_connection_info = self._connection.merge_with(conn_id=self._connection.connection_id)
        return awesome_client(
            login=resolved_connection_info.login,
            password=resolved_connection_info.password,
            hostname=resolved_connection_info.hostname,
            code_only_parameter=resolved_connection_info.hostname,
        )

    def do_something_awesome(self, param1, param2):
        return self.client.sudo_rm_rf_root_dir(param1, param2, preserve_root=False)

    def do_something(self, param1, param3):
        return self.client.sleep(param1, param3)


class AwesomeOperatorDoSomethingAwesome(BaseOperator):
    def __init__(self, *, connection: AwesomeConnection, param1, param2, **kwargs):
        super().__init__(**kwargs)
        self._connection = connection
        self.param1 = param1
        self.param2 = param2

    @cached_property
    def hook(self):
        return AwesomeHook(connection=self._connection)

    def execute(self, context):
        return self.hook.do_something_awesome(self.param1, self.param2)


class AwesomeOperatorDoSomething(BaseOperator):
    def __init__(self, *, connection: AwesomeConnection, param1, param3, **kwargs):
        super().__init__(**kwargs)
        self._connection = connection
        self.param1 = param1
        self.param3 = param3

    @cached_property
    def hook(self):
        return AwesomeHook(connection=self._connection)

    def execute(self, context):
        return self.hook.do_something(self.param1, self.param3)

This primitive solution wouldn't work well with something abstracted like HttpHook.

And I can't find the way how to start moving from "put whatever you wanted to connection, hook and you could even not extend the documentation" to some partially-strict, which do not give a chance to use something unsafe from connection, and restrict provide sensitive information directly thought user code (tokens/passwords and etc.)

Maybe there is some intermediate solution exists without break everything for everyone

image

@Taragolis
Copy link
Contributor Author

I'm still don't get why Breeze unit tests failed, I've check it locally on that branch and it also complains.
Maybe test doesn't like that changes happen in two provider in one go, or complain about new submodule protocols into the amazon provider

@Taragolis
Copy link
Contributor Author

And just in additional to implementation in this PR and

If go thought community providers we easily could find than many of them implements it's own unique method with Connection integration and resolving merging parameters.
Let's me show how I would see some simple integration

image

@potiuk
Copy link
Member

potiuk commented Aug 29, 2023

I'm still don't get why Breeze unit tests failed, I've check it locally on that branch and it also complains. Maybe test doesn't like that changes happen in two provider in one go, or complain about new submodule protocols into the amazon provider

docker is missing in the list of provider that shoudl be afffected when "amazon" changes:

It does not expect docker because previously docker package was not dependent on amazon. It should also fail locally for you as long as you run pre-commit and generate the "generated/provider_dependencies.json" - this is where breeze selective checks finds out which provider tests should be run if another provider (that it depends on or dependency is the other way) changes. This is the way to determine minium set of provider tests that should be run for the PR (and get maximum chance of not getting broken main).

@potiuk
Copy link
Member

potiuk commented Aug 29, 2023

(and yes probably this test should use artificial, rather than real "generated/provider_dependencies.json". I will fix it shortly, it causes some unnecessary friction when you add implicit dependencies between providers that weren't there before.

@potiuk
Copy link
Member

potiuk commented Aug 29, 2023

I wil do it after we merge this PR.

@Taragolis
Copy link
Contributor Author

I wil do it after we merge this PR

I would rather say if we merge this PR, I have personal concern of:

  1. How it work exclusively for integration with Docker. Implementation for ECR not exclusive, it might be implemented for other Container/Docker Registry providers who could provide temporal access to registry
  2. Naming, still not sure that this part should named as protocols, for me it so broad.

If this solution would be accepted, we might finally remove exclusive implementation AWS IAM authentication in PostgresHook and MySQLHook by the same method.

@Taragolis Taragolis force-pushed the docker-registry-credentials branch from 2939b43 to a9864e8 Compare September 5, 2023 07:36
@Taragolis Taragolis force-pushed the docker-registry-credentials branch from a9864e8 to 3d8abce Compare September 5, 2023 08:06
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Oct 21, 2023
@github-actions github-actions bot closed this Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:docker stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants