-
Notifications
You must be signed in to change notification settings - Fork 16.4k
openlineage: Add AirflowDagRunFacet to dag runEvents #40854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openlineage: Add AirflowDagRunFacet to dag runEvents #40854
Conversation
b07fccd to
5f30877
Compare
|
@dolfinus I think this should be a different facet or added to |
DagRun information is a part of RunFacet, as it has fields like
Honestly, I'd rather split these fields to different facets, like: job:
facets:
airflow_dag: AirflowDagJobFacet(...) # previously airflow:dag
airflow_task: AirflowTaskJobFacet(...) # previously airflow:task
run:
facets:
airflow_dagRun: AirflowDagRunFacet(...) # previously airflow:dagRun
airflow_taskInstance: AirflowTaskInstanceFacet(...) # previously airflow:taskInstance
# throw away taskUuuid, it just ruplicates runIdBut this is not backwards compatible. |
|
Why not attach it to some new facet as I'd rather have the same DagRun information attached to different facets in TaskInstance level events and DagRun events. |
What should be the key of this facet? Should |
We can think of a new name, that would not be confusing. I don't think we even could go with another run facet with I think the renaming you propose would not be a big issue, as it would not change the spec, only the import, so we would have to keep old name for compatibility. Personally, I would avoid doing any changes to AirflowRunFacet if possible. I think renaming will not be necessary if we come up with a better name for the new one 😄 |
|
My current plan is:
What do you think? |
I think i would proceed with the last point for now in this PR, and create a discussion for the first two so that we can decide if / how we want to make it less confusing for users. The AirflowRunFacet is the primary facet that users rely on, so we need to handle it with care. Before proceeding with any actions, we should have an in-depth discussion, as deprecating it without such a discussion would not be advisable. 😄 We can also discuss here what would be the best key for this new facet: |
b705273 to
2b5fc66
Compare
|
Updated implementation - reverted all the changes to Regarding changing |
kacpermuda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, left some comments
39bfdcf to
d1afb37
Compare
e753f78 to
18d9815
Compare
kacpermuda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left some comments
18d9815 to
e942dad
Compare
e942dad to
6529dbf
Compare
closes: #40798
AirflowDagRunFacetwhich copiesAirflowRunFacet, but without fieldstask,taskInstance,taskUuid.eventType=START.This allows to collect data like dag tags, dag schedule interval, dagRun id and type on OpenLineage consumer side. Before that, this information was available only for task events.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.