Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Jan 14, 2020

Adding new field in BaseOperator requires some manual updates
in serialization code. This test detects new fields and informs
what should be done in case new field is added.


Issue link: AIRFLOW-6557

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that I'm now not sure this last part is true.

So since DAG serialization schema version 1.0 is now released with 1.10.7 we might have to start thinking about versioning of this schema and migrate/update. But the important bit is that a row that exists in the DB right now must continue to work and that is what the "ground truth" (not the best name) is meant to represent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think then we need to decide what to do in this case and tell @lokeshlal what to do in this case: #7162 (and update this description).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the goal of the ground truth test is to ensure that DAGs that are currently in people's databases get correctly handled as the model changes. The important bit I think is that this data can't change sa it's fixed in a DB.

But some changes such as adding an optional field to Operators/Tasks is allowed, as the JSON Schema allows unknown fields at the task level.

But by default BaseOperator.get_serialized_fields will include extra fields. So I guess the only check I would like here is that new fields from base operator get specified with a type in the JSON schema, and that it sohuld be optional (if it's not optional then we would have to bump the version field in our schema. But we haven't worked out how to handle versioning of schemas and blobs yet!) -- which is sadly harder to test for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar rules as for protobuf. I will modify the message. We cannot test a lot of that fully automatically yet but we can at least provide the right message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should good enough for the time being

Copy link
Contributor

@bolkedebruin bolkedebruin Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, and switch away from JSON Schema. Geez its 2020 :-P

JSON you only use to interface externally.

@codecov-io
Copy link

codecov-io commented Jan 14, 2020

Codecov Report

Merging #7162 into master will decrease coverage by 0.31%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7162      +/-   ##
==========================================
- Coverage    85.4%   85.09%   -0.32%     
==========================================
  Files         723      723              
  Lines       39537    39546       +9     
==========================================
- Hits        33768    33651     -117     
- Misses       5769     5895     +126
Impacted Files Coverage Δ
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/kubernetes/pod_launcher.py 45.25% <0%> (-46.72%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
...rflow/contrib/operators/kubernetes_pod_operator.py 78.31% <0%> (-20.49%) ⬇️
airflow/configuration.py 88.01% <0%> (-3.43%) ⬇️
airflow/cli/commands/db_command.py 95.12% <0%> (-2.32%) ⬇️
airflow/utils/db.py 97.93% <0%> (-2.07%) ⬇️
airflow/utils/dag_processing.py 87.8% <0%> (-0.39%) ⬇️
airflow/bin/cli.py 94.73% <0%> (ø) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b38766e...778173f. Read the comment docs.

@ashb ashb requested a review from kaxil January 15, 2020 19:02
@potiuk potiuk force-pushed the AIRFLOW-6557-add-tests-for-fields-not-included-in-serialization branch from 8ad2704 to 2fc738f Compare January 15, 2020 20:53
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge it once we have GREEN CI

@kaxil
Copy link
Member

kaxil commented Jan 15, 2020

image

:) The bot caught you @potiuk

Adding new field in BaseOperator requires some manual updates
in serialization code. This test detects new fields and informs
what should be done in case new field is added.
@potiuk potiuk force-pushed the AIRFLOW-6557-add-tests-for-fields-not-included-in-serialization branch from 2fc738f to 2acd697 Compare January 15, 2020 21:00
@potiuk
Copy link
Member Author

potiuk commented Jan 15, 2020

image

:) The bot caught you @potiuk

Good bot! Nice bot!

`airflow/serialization/schema.json` - they should have correct type defined there.

Note that we do not support versioning yet so you should only add optional fields. We do not support
versioning yet so you should make sure all fields added to the BaseOperator should be optional.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've duplicated(ish) the message here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah :)

@potiuk potiuk merged commit 5abce47 into apache:master Jan 17, 2020
Copy link
Contributor

@bolkedebruin bolkedebruin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just dropped a thought. I am not happy with the choice for JSON Schema.

@potiuk
Copy link
Member Author

potiuk commented Jan 17, 2020

Just dropped a thought. I am not happy with the choice for JSON Schema.

Not that I am for JSON schema (I was barely involved in DAG serialisation). For me it's simply one of the options one could consider (alongside Protocol Buffers for example). But I am really curious what would be your arguments against it @bolkedebruin - It seems you have pretty strong opinion.

@kaxil
Copy link
Member

kaxil commented Jan 17, 2020

@bolkedebruin

Would be curious to know why you feel so? And what would you recommend as an alternative?

potiuk added a commit that referenced this pull request Jan 21, 2020
Adding new field in BaseOperator requires some manual updates
in serialization code. This test detects new fields and informs
what should be done in case new field is added.

(cherry picked from commit 5abce47)
kaxil pushed a commit that referenced this pull request Jan 22, 2020
Adding new field in BaseOperator requires some manual updates
in serialization code. This test detects new fields and informs
what should be done in case new field is added.

(cherry picked from commit 5abce47)
kaxil pushed a commit that referenced this pull request Jan 23, 2020
Adding new field in BaseOperator requires some manual updates
in serialization code. This test detects new fields and informs
what should be done in case new field is added.

(cherry picked from commit 5abce47)
potiuk added a commit that referenced this pull request Jan 26, 2020
Adding new field in BaseOperator requires some manual updates
in serialization code. This test detects new fields and informs
what should be done in case new field is added.

(cherry picked from commit 5abce47)
kaxil pushed a commit that referenced this pull request Jan 26, 2020
Adding new field in BaseOperator requires some manual updates
in serialization code. This test detects new fields and informs
what should be done in case new field is added.

(cherry picked from commit 5abce47)
galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
…e#7162)

Adding new field in BaseOperator requires some manual updates
in serialization code. This test detects new fields and informs
what should be done in case new field is added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants