Skip to content

Conversation

@ephraimbuddy
Copy link
Contributor

@ephraimbuddy ephraimbuddy commented Sep 26, 2024

The serialized DAG dictionary is not ordered correctly when creating hashes, and that causes inconsistent hashes, leading to
frequent update of the serialized DAG table.

Changes:

Implemented sorting for serialized DAG dictionaries and nested structures to ensure consistent and predictable serialization order for hashing. Using sort_keys in json.dumps is not enough to sort the nested structures in the serialized DAG.

Added serialize and deserialize methods for DagParam and ArgNotSet to allow for more structured serialization.

Updated serialize_template_field to handle objects that implement the serialize method. This was done because of DagParam and ArgNotSet in the template fields. Previously, it produced an object, but with this change, it now serialises to a consistent object.

@ephraimbuddy ephraimbuddy changed the title Ensure consistent Seriailized DAG hashing with deterministic serialization Ensure consistent Seriailized DAG hashing Sep 26, 2024
@ephraimbuddy ephraimbuddy force-pushed the consistent-hashes branch 3 times, most recently from 75a4ab1 to 5899bb3 Compare September 26, 2024 19:00
The serialized DAG dictionary is not ordered correctly when creating hashes, and that causes inconsistent hashes, leading to
frequent update of the serialized DAG table.

Changes:

Implemented sorting for serialized DAG dictionaries and nested structures to ensure consistent and predictable serialization order for hashing. Using `sort_keys` in `json.dumps` is not enough to sort the nested structures in the serialized DAG.

Added serialize and deserialize methods for DagParam and ArgNotSet to allow for more structured serialization.

Updated serialize_template_field to handle objects that implement the serialize method. This was done because of DagParam and ArgNotSet in the template fields. Previously, it produced an object, but with this change, it now serialises to a consistent object.
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable enough to me.

@ephraimbuddy ephraimbuddy merged commit 8f7616c into apache:main Sep 30, 2024
@ephraimbuddy ephraimbuddy deleted the consistent-hashes branch September 30, 2024 11:10
joaopamaral pushed a commit to joaopamaral/airflow that referenced this pull request Oct 21, 2024
* Ensure consistent Seriailized DAG hashing

The serialized DAG dictionary is not ordered correctly when creating hashes, and that causes inconsistent hashes, leading to
frequent update of the serialized DAG table.

Changes:

Implemented sorting for serialized DAG dictionaries and nested structures to ensure consistent and predictable serialization order for hashing. Using `sort_keys` in `json.dumps` is not enough to sort the nested structures in the serialized DAG.

Added serialize and deserialize methods for DagParam and ArgNotSet to allow for more structured serialization.

Updated serialize_template_field to handle objects that implement the serialize method. This was done because of DagParam and ArgNotSet in the template fields. Previously, it produced an object, but with this change, it now serialises to a consistent object.

* Move hashing to a method

* fixup! Move hashing to a method

* Add test
ellisms pushed a commit to ellisms/airflow that referenced this pull request Nov 13, 2024
* Ensure consistent Seriailized DAG hashing

The serialized DAG dictionary is not ordered correctly when creating hashes, and that causes inconsistent hashes, leading to
frequent update of the serialized DAG table.

Changes:

Implemented sorting for serialized DAG dictionaries and nested structures to ensure consistent and predictable serialization order for hashing. Using `sort_keys` in `json.dumps` is not enough to sort the nested structures in the serialized DAG.

Added serialize and deserialize methods for DagParam and ArgNotSet to allow for more structured serialization.

Updated serialize_template_field to handle objects that implement the serialize method. This was done because of DagParam and ArgNotSet in the template fields. Previously, it produced an object, but with this change, it now serialises to a consistent object.

* Move hashing to a method

* fixup! Move hashing to a method

* Add test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants