Serialize Dags before making TI.dag_version_id non-nullable#53820
Serialize Dags before making TI.dag_version_id non-nullable#53820ephraimbuddy wants to merge 1 commit intoapache:mainfrom
Conversation
It seemed simpler to reserialize the DAGs and update the task instances directly, so that's what I did here. I'm slightly unsure if this could lead to failures during reserialization or cause performance issues. An alternative would have been to manually create entries for serialized_dag, dag_version, and dag_code before updating the TIs, but that felt more complex. The issue here is that, upgrades from AF2 fails due to the TIs not been associated with dag_versions. The issue mainly affects users upgrading from Airflow 2, since in Airflow 3 the dag_version table is already populated for all DAGs.
|
hmm -- we merged a change to just blow away old serialization -- shouldn't that have made this a non-issue? |
This one is about |
|
Do we need to reserialise during migration? I feel it might be enough to just delete the serialised data. I believe Airflow should automatically reserialise missing dags once the migration finishes and the scheduler is restarted? |
We deleted it initially here https://github.com/apache/airflow/pull/43700/files when migrating from AF2 but realized we could loose true histories and reverted it. Deleting the serdag could have been better then but I think we are doing it late if we do it at this point as we could loose AF3+ histories. |
What do you mean we could lose true histories? In airflow 2, we don't have serdag history |
|
This is alternative to apache#53820. Here we make the TI.dag_version_id nullable on the database level. it's still enforced in code
This is alternative to apache#53820. Here we make the TI.dag_version_id nullable on the database level. it's still enforced in code
This is alternative to apache#53820. Here we make the TI.dag_version_id nullable on the database level. it's still enforced in code
|
Closing in preference to #54366 |
It seemed simpler to reserialize the DAGs and update the task instances directly, so that's what I did here. I'm slightly unsure if this could lead to failures during reserialization or cause performance issues.
An alternative would have been to manually create entries for serialized_dag, dag_version, and dag_code before updating the TIs, but that felt more complex.
The issue here is that, upgrades from AF2 fails due to the TIs not been associated with dag_versions. The issue mainly affects users upgrading from Airflow 2, since in Airflow 3 the dag_version table is already populated for all DAGs.