Skip to content

Conversation

@ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Nov 13, 2024

Now that we have versioning, users must specify that they want to delete history before we do it in airflow dags reserialize command.

serialize dags are no longer deleted as part of this command. Users should use airflow db-clean command

Also updated the _reserialize function at DB upgrade so that it doesn't delete the serializedDag since that won't be necessary.

@potiuk
Copy link
Member

potiuk commented Nov 13, 2024

Approved. We need it now. Though I think it's quite a big limitation. Possibly (@jedcunningham ) - with DAG bundles we could actually in the future to reserialize also history ? It sounds feasible, we would just have to go through version history and checkout the bundle in each version and reserialize it then. It also might (or might not) work - depends for example on the third-party package versions installed now vs then but we could at least try.

But yeah, that's likely Airflow 3.5+ or smth :)

@potiuk
Copy link
Member

potiuk commented Nov 13, 2024

Maybe also we should attempt to keep old serialized DAGS and attempt to load them (Assuming that our serialization is forward-compatible) . That could also be done on a "best effort" case.

@ephraimbuddy
Copy link
Contributor Author

Maybe also we should attempt to keep old serialized DAGS and attempt to load them (Assuming that our serialization is forward-compatible) . That could also be done on a "best effort" case.

Yeah, old serialized dags won't be deleted except users want to do so. We would keep a serialization version that when deserializing, we could check the version it was serialized with and use that to deserialize it. It should be forward-compatible

@jedcunningham
Copy link
Member

with DAG bundles we could actually in the future to reserialize also history ? It sounds feasible, we would just have to go through version history and checkout the bundle in each version and reserialize it then.

This can already be pretty slow. Doing it for every historical version isn't an option imo.

Maybe also we should attempt to keep old serialized DAGS and attempt to load them (Assuming that our serialization is forward-compatible) . That could also be done on a "best effort" case.

I think something like this is the right way to handle this. I think Kaxil is going to look into it as part of AIP-72 stuff: #43648

@ephraimbuddy ephraimbuddy force-pushed the update-reserialize branch 2 times, most recently from 3061cdf to 67db5b2 Compare November 14, 2024 13:00
@ashb
Copy link
Member

ashb commented Nov 14, 2024

Given users often run this as part of upgrades (or db upgrade does it itself) we'll need to stop deleting things very soon

@ephraimbuddy
Copy link
Contributor Author

I'm thinking of renaming this command as airflow dags re-version or any better name other than reserialize. Thoughts?

@ephraimbuddy
Copy link
Contributor Author

Will appreciate another review @jedcunningham @pierrejeambrun

Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not yet super familiar with the Versioning AIP but this makes sense to me.

Thanks.

@dstandish
Copy link
Contributor

I'm thinking of renaming this command as airflow dags re-version or any better name other than reserialize. Thoughts?

I think re-version is probably not a good name. Shall we chat about it?

Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we may want to revisit some of the language changes. blocking merge @ephraimbuddy has chance to respond

@ephraimbuddy ephraimbuddy force-pushed the update-reserialize branch 2 times, most recently from abb2b72 to f80eb47 Compare November 20, 2024 11:28
Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple suggestions

Now that we have versioning, users must be sure they want to use the
dags reserialize command, as it deletes dag history.

I updated the command to ensure users know it will delete DAG history
and answer yes for it to continue. The DagVersion is deleted and since
it has foreignkey to SerializedDagModel and DagCode, those will also
get deleted.

Also updated the _reserialize function at DB upgrade so that it doesn't
delete the serializedDag since that won't be necessary

Updated the test to use session fixture instead of create_session
@ephraimbuddy ephraimbuddy merged commit dc86801 into apache:main Nov 26, 2024
@ephraimbuddy ephraimbuddy deleted the update-reserialize branch November 26, 2024 11:30
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
* Update dag reserialize command

Now that we have versioning, users must be sure they want to use the
dags reserialize command, as it deletes dag history.

I updated the command to ensure users know it will delete DAG history
and answer yes for it to continue. The DagVersion is deleted and since
it has foreignkey to SerializedDagModel and DagCode, those will also
get deleted.

Also updated the _reserialize function at DB upgrade so that it doesn't
delete the serializedDag since that won't be necessary

Updated the test to use session fixture instead of create_session

* add --clear-history to dag reserialize command

* Add news fragment item

* Remove clear-history

* fix test

* fixup! fix test

* Update newsfragments/43949.significant.rst
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants