-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Support for set in XCom serialization (fix #8703) #9847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
|
|
Hey, @turbaszek @potiuk thank you very much for the Workshop yesterday. It was great! |
|
@tatiana There is a test failing which is likely related to XCom Serialization change.. |
|
Thanks, @potiuk ! I'll solve those later today, sorry - I'll make sure the CI is happy. |
|
I did not yet have time to take a deeper look - and a little busy today :(. I will take a look tomorrow I think! Nothing to be sorry about :). Tests are there to catch all those things that are not obvious :) |
|
Hm, let me take a look at the failing lineage tests... |
airflow/serialization/json.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should support None as possible return value for XCom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @turbaszek, for pointing this out - I just added a test & support for None.
I think it's a good idea. @kaxil you are our serialization expert, what do you think? |
Relates to issue: #8703
Based on airflow/serialization/serialized_objects.py:BaseSerialization The purpose was to create a lightweight module which could be used by XCom in order to serialise sets and other structures which contain nested sets. Relates to issue: #8703
when using JSON serialization. Resolves: #8703
Thanks, @turbaszek, I'm struggling to reproduce the issues from the CI locally. I tried to run Do you have any ideas on why the quarantine tests might be currently be killed? Usually, the exit code 137 relates to high memory consumption, but it isn't clear to me why the changes I introduced had this side effect 🤔 Do you have any advice?
Previously there were some view Core tests failing, but it seems they are now passing. Perhaps they were related to the |
That's usually the problem. The quarantined tests often use multiprocessing and mixing this with flaky tests results in unstable builds. I usually try to re-run quarantined tests locally if I have suspicion that my change may impact those tests. |
kaxil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am taking a look at this now. Preferably I would use the serialized_objects module and avoid the duplication.
Let me test something and will post my findings soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you guys think about the following patch:
curl https://pastebin.com/raw/p6Dq8aiK | git am
|
@kaxil I'm ok with a proposed approach as long as pylint is happy. General speaking we will introduce cyclic dependency but a working one by adding in Xcom serialize and deserialize methods. |
|
Thank you both for taking the time to review and give feedback. @kaxil as mentioned by @turbaszek, it would be great if we could avoid introducing this cyclic dependency. I fully agree with avoiding duplicating the code as well. How would you both feel if I try to extract the |
We previously had base_serialization, serialized_operator and serialized_dag in 3 separate files but that suffered from some cyclic dependencies. We logically grouped them to avoid it but since serialization touches almost most of the models it gets tricky. Check #6718 where that was fixed. If just moving BaseSerialization to a separate module helps fix the cyclic dependencies I would be happy but please do check that removes 1 cyclic dependency doesn't introduce other cyclic dependency |
|
@tatiana let's check if Kaxil proposition is accepted by pylint. If yes let's use it to solve this issue. After that, we can think about extracting this logic. |
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
any updates? |
|
It's been a while, any updates on this one @tatiana ? |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Closes #8703. Any feedback is very welcome!
There are some standing questions:
serialized_objects.pymodule, without duplicating any code, following the suggestion from Support for set in XCom serialization #8703 (comment):However, this seems to have resulted in a circular dependency, which could be observed when running
pytest tests/models/test_xcom.py:serialization/serialized_objects.py::BaseSerializationto use the module so we avoid duplicating the logic.Make sure to mark the boxes below before creating PR: [x]
In the case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In the case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards-incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.