Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Dec 3, 2023

The architecture diagram of Airflow has been long time outdated.

This is an attempt to generate it using generated diagrams using
Python's diagrams library (already used by some tools in our
ecosystem).


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk potiuk added this to the Airflow 2.8.0 milestone Dec 3, 2023
@potiuk potiuk requested a review from ashb as a code owner December 3, 2023 17:51
@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch from 39aa850 to 83617b0 Compare December 3, 2023 17:51
@potiuk potiuk changed the title Generate airflow architecture diagram Replace architecture diagram of Airflow with diagrams-generated one Dec 3, 2023
Copy link
Contributor

@BasPH BasPH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments + for a basic diagram I'd show only one user in the diagram, with an arrow to both the DAG files & webserver.

@potiuk
Copy link
Member Author

potiuk commented Dec 3, 2023

Few minor comments + for a basic diagram I'd show only one user in the diagram, with an arrow to both the DAG files & webserver.

I thought about it, and I think It's good to mention DAG Authors and Ops Users separately. They are part of our security model and I think it would be great to keep them separated here - becuse essentially they are different users. I think there are a lot of misconceptions on the "users" accessing the DAGs and UI to be the same - but in fact all teh security mechanism and even often actually UI users are often separate ones. ..

I will have some more iterations on that - and I think we should rewrite more of our graphs (and I am planning to use the tool to add diagrams for the security model of ours (and later for multi-tenancy), so maybe it's indeed a good idea to keep one user and link to more "Complex" variants of the architecture separately ? WDYT?

@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch 2 times, most recently from 9362324 to 7bb42a2 Compare December 3, 2023 21:13
@potiuk
Copy link
Member Author

potiuk commented Dec 3, 2023

I introduced two diagrams now - one basic, and one with standalond DAG file processor. While it is not yet fullly - mutlitenant, this already gives some good properties (like scheduler not having accesss DAG files at all) and having this picture described now is a good idea - and reflecting the current architecture.

I also added a "dashed" line showing "executors" -> link between scheduler and workers and it looks better in Right to Lefte form - it also shows nicely the progression of things that happen with the tasks - scheduler being on the left of workers and triggerers.

I converted the script to be entirely in pre-commit and added hash check so that it will not be running unnecessarily even in CI with --all-files. This way we will not recreate the images when not needed - only when the diagram sources change.

I left two types of users for now. - I still think it is a good idea even for "basic" diagram.

I also hope we will regenerate more diagrams using the same approach - celery , kubernetes, logging etc. - they will be so much

@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch 5 times, most recently from c6a4a19 to e920247 Compare December 3, 2023 23:04
@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch 5 times, most recently from 0e62230 to 83b460f Compare December 4, 2023 10:21
@potiuk
Copy link
Member Author

potiuk commented Dec 4, 2023

Ok. I added a few more touches and the "DAG file processor" case is now much nicer and cleaner shows what I wanted to show - separation between the part where DAG files are actually parsed and executed and when they are not.

In this case also separation betwen the Users is much more apparent - showing that the UI user has no influence on Arbitrary DAG code execution while the DAG author does not.

For me this is really first stepping stone/diagram that we will explain in the future for multi-tenancy architecture (which will mostly be showing how you can build Airfflow from it's building block in order to achieve multi-tenant architecture if you really want.

So I think it's worth to gradually introduce this architecture (and link to the architecture from our Security Model which describes the details about those different types of users and their capabilities. I'v also added links between the architecture and security model involved, as I think this is a great way to educate the users on security implications of the architecture
they chose.

WDYT @BasPH ?

@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch 3 times, most recently from 3983187 to ca90947 Compare December 4, 2023 10:41
@potiuk
Copy link
Member Author

potiuk commented Dec 4, 2023

cc: @feluelle -> this is also result of what we talked about many months ago - the inspiration came from https://github.com/feluelle/airflow-diagrams :) .. I hope we can convert all the diagrams we have in Airflow to use it (and need a bit more familiarity with manipulating attributes of the nodes and edges to help graphviz to come out with a bit better layouts so I hope we can tap into your experience there :D

@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch from ca90947 to 9ce5b4c Compare December 4, 2023 11:15
@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch from eef1991 to 77cd660 Compare December 4, 2023 11:27
@potiuk
Copy link
Member Author

potiuk commented Dec 4, 2023

I came up with much nicer layout . I think it's very close to what I had in mind.

@potiuk
Copy link
Member Author

potiuk commented Dec 4, 2023

cc: @mhenc @vincbeck -> I think this is very closely reflecting the "trusted" / "untrusted" split we were always using when it comes to AIP-44.

The nice thing about it is when we get to Internal-API prime time and introduction, the 2nd diagram will become way simpler - because it will get just an "Internal API" shielding the left side of the graph from the Database (outside of the "DAG Execution" zone.

@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch from 77cd660 to 39c078e Compare December 4, 2023 11:35
@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch 4 times, most recently from c001bb8 to 4a1e5e5 Compare December 4, 2023 23:49
@potiuk
Copy link
Member Author

potiuk commented Dec 4, 2023

@BasPH - are you ok with keeping two users ? I do feel it's much clearer this way. And it sets the stage for 2.9 changes and "security-focus" we have now (and sets the stage for future, more "isolation" cases...

I also quite like this one:

image

Where the DAGs are "flowing" from left to righ -> from the author to someone who sees the result of execution :)

@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch from 4a1e5e5 to 55e0b0b Compare December 5, 2023 17:52
The architecture diagram of Airflow has been long time outdated.

This is an attempt to generate it using generated diagrams using
Python's diagrams library (already used by some tools in our
ecosystem).
@potiuk potiuk force-pushed the generate-airflow-architecture-diagram branch from 55e0b0b to 025d242 Compare December 5, 2023 19:04
@potiuk potiuk merged commit 5dfee8b into apache:main Dec 5, 2023
@potiuk potiuk deleted the generate-airflow-architecture-diagram branch December 5, 2023 20:15
@ephraimbuddy ephraimbuddy added the type:doc-only Changelog: Doc Only label Dec 6, 2023
ephraimbuddy pushed a commit that referenced this pull request Dec 6, 2023
…36035)

The architecture diagram of Airflow has been long time outdated.

This is an attempt to generate it using generated diagrams using
Python's diagrams library (already used by some tools in our
ecosystem).

(cherry picked from commit 5dfee8b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants