Skip to content

Conversation

@ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Oct 2, 2024

Depends on #42547
This helps to track the serialized DAG version the task instance ran
with, by establishing a relationship between the entities instead of using
the dag_hash.

Only the last commit is relevant

@ephraimbuddy ephraimbuddy changed the title AIP-65: Add dag_hash to TaskInstanceHistory AIP-65: Track the serialized DAG across DagRun & TaskInstance Oct 3, 2024
@ephraimbuddy ephraimbuddy force-pushed the track-serdag-version branch 7 times, most recently from 38ef224 to e57ea73 Compare October 4, 2024 20:23
This commit adds versioning to the serializedDagModel.

Changes:
Added new columns, id, and version_number to the SDM and made id the
primary key.

Updated the write_dag method of the SDM to add the SDs correctly.

Updated the queries so the scheduler/webserver runs with the latest SDM

The version_number was added to help us track the evolution of a DAG.
Suppose a DAG with dag_hash AB is changed, and the dag_hash becomes CD.
If the change is reverted, we will have a dag_hash of AB again. In this
case, the version_number would still increment, letting us know that the
DAG was changed three times. I feel it's a meaningful way to track the changes,
independent of the id column, which is database internals.
This helps to track the serialized DAG version the task instance ran
with by establishing a relationship between the entities instead of using
the dag_hash.
@ephraimbuddy ephraimbuddy marked this pull request as ready for review October 4, 2024 20:27
@ephraimbuddy
Copy link
Contributor Author

Closing in preference of #42913

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants