Revert "enhance spark_k8s_operator (#29977)" #31716

jamescnowell · 2023-06-05T16:29:24Z

This reverts commit 9a4f674.

Based on the discussion found
here, previous changes
to the Spark K8s Operator broke existing functionality and did not update the documentation for the newly enabled functionality.

The Spark Sensor no longer works, XCOM no longer works on the Operator itself, and the Operator does not fail when the Spark job fails.

Rather than attempt to fix or resolve the current implementation, I am reverting to the existing, documented implementation.

I would propose creating a new Operator with alternative functionality (one which does not need a Sensor, copies logs, etc.) if that is desired.

closes: #31183

This reverts commit 9a4f674. Based on the discussion found [here](apache#31183), previous changes to the Spark K8s Operator broke existing functionality and did not update the documentation for the newly enabled functionality. The Spark Sensor no longer works, XCOM no longer works on the Operator itself, and the Operator does not fail when the Spark job fails. Rather than attempt to fix or resolve the current implementation, I am reverting to the existing, documented implementation. I would propose creating a _new_ Operator with alternative functionality (one which does not need a Sensor, copies logs, etc.) if that is desired.

boring-cyborg · 2023-06-05T16:29:28Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
Be sure to read the Airflow Coding style.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: dev@airflow.apache.org
Slack: https://s.apache.org/airflow-slack

eladkal · 2023-06-05T18:36:24Z

I don't think revert is required?
This sounds more like a case where additional functionality + docs are needed and once we have it and release new version we can (maybe) yank the old one.

The PR in question was released in April. There may be users out there who relay on it and reverting it would cause their code to break.

jamescnowell · 2023-06-05T19:38:43Z

additional functionality
As it is currently, it is a non-backwards compatible change in the functionality of the Operator, which previously relied entirely on the Sensor for monitoring the job status (as far as I know). Currently, the Operator maintains a Running status for the duration of the Spark job, which changes the possible DAG configurations.

I have an immediate need to have a working DAG, meaning I need to revert to an older version of the Operator (as this version does not fail when a job fails). On top of that, if the change moves forwards as-is, I will not only need to update my DAGs in the future, but will also lose features I may be relying on (the ability to have a fork in a DAG after launching a Spark job, but before it's finished).

I'm definitely for an Operator that behaves in this way, but either by adding flags to the existing operator, or creating an Operator with a different name, and a deprecation schedule for the existing Operator.

potiuk · 2023-06-05T21:24:50Z

I'm definitely for an Operator that behaves in this way, but either by adding flags to the existing operator, or creating an Operator with a different name, and a deprecation schedule for the existing Operator.

Can you do it this way rather than reverting the change?

While I understand your case is impacted, it can be easily fixed by downgrading to earlier version of provider, so "properly" fixing it seems to be a better way than reverting. Reverting a change that has already been released only adds confusion, because if somone relies on what has been already released then you break the other's workflows while you are fixing yours. Can you add a flag to control the behaviour instead?

eladkal · 2023-06-05T21:33:55Z

I agree with @potiuk .
Your immidiate issue can be solved by downgrading the version of the provider till the underlying issue is resolved.

The fact that you agree and support the code modifications made by previous PR makes even stronger case not to revert.
Can you please modify the PR to adress the concerns you raised (docs etc..)

jamescnowell · 2023-06-06T14:20:19Z

I'm not able to spend that much time on changes like that right now. You can close this PR if it's not something you plan on accepting.

I find it frustrating that non-backwards compatible changes are allowed into to "stable" API versions, as will many others.

For the "Some people may be relying on it" statement... while that is possible, the current version is fundamentally broken, so those people will have jobs that fail with no notification. I believe most of those people will have ended up finding #31183 as using the code provided in the examples errors out very quickly.

I understand this is already a sticky situation and there are no good solutions, this is simply the solution that I had time for, that appeared to be in line with having stable API version numbers.

potiuk · 2023-06-06T14:32:53Z

then maybe @blcksrx will have time to fix it since that was the #31183 change that broke it. Do you agree @blcksrx that this is "fundamentally broken" ?

Can you please fix it @blcksrx ? I think if not then indeed we will have to rever it, unless someone provides a fix.

I find it frustrating that non-backwards compatible changes are allowed into to "stable" API versions, as will many others.

Surely @jamescnowell . It is frustrating for everyone. But maybe you do not understand the perspective and OSS spirit - it might happen. Maybe you could better do it with 80 providers, 5000 or so operators and 30 commits merged a day and multiple thousands of tests to prevent such errors from happening. Airflow has 2500 contributors and you get it for absolutely free. There are many ways you and others can give back - spending time (as the 2500 contributors) is one of the ways - and I understand you do not have more of the time (but then maybe github Sponsorship or finding someone in your company - that uses the software for free) would be a good way to give back for the countless hours people spend on contributing to Airflow so that your company can use it for free.

blcksrx · 2023-06-07T14:33:49Z

@potiuk
I'm not completely agree this version is broken. It's could be fixed in no time!

eladkal · 2023-06-07T15:03:16Z

I find it frustrating that non-backwards compatible changes are allowed into to "stable" API versions, as will many others

This was not intentional.
We have several check points to minimize such issues but its not bullet proof. I welcome you to participate in testing releases to minimize such future cases.

The issue here is that segnificant time has passed since the release and reverting / yanking the version might cause more harm than good as there may be users that already used the modified version.

Your issue is solved by simply not updating the provider till a fix is merged.

potiuk · 2023-06-07T15:46:13Z

@potiuk I'm not completely agree this version is broken. It's could be fixed in no time!

Can you make a fix then, please ?

eladkal · 2023-06-08T16:38:46Z

Closing in favor of #31798
@jamescnowell I welcome you to assist with the review

jamescnowell requested a review from jedcunningham as a code owner June 5, 2023 16:29

boring-cyborg bot added provider:cncf-kubernetes Kubernetes (k8s) provider related issues area:providers provider:Apache labels Jun 5, 2023

jamescnowell mentioned this pull request Jun 5, 2023

SparkKubernetesSensor: 'None' has no attribute 'metadata' #31183

Closed

2 tasks

eladkal closed this Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "enhance spark_k8s_operator (#29977)" #31716

Revert "enhance spark_k8s_operator (#29977)" #31716

Uh oh!

jamescnowell commented Jun 5, 2023

Uh oh!

boring-cyborg bot commented Jun 5, 2023

Uh oh!

eladkal commented Jun 5, 2023

Uh oh!

jamescnowell commented Jun 5, 2023

Uh oh!

potiuk commented Jun 5, 2023 •

edited

Loading

Uh oh!

eladkal commented Jun 5, 2023 •

edited

Loading

Uh oh!

jamescnowell commented Jun 6, 2023

Uh oh!

potiuk commented Jun 6, 2023

Uh oh!

blcksrx commented Jun 7, 2023

Uh oh!

eladkal commented Jun 7, 2023 •

edited

Loading

Uh oh!

potiuk commented Jun 7, 2023 •

edited

Loading

Uh oh!

eladkal commented Jun 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Revert "enhance spark_k8s_operator (#29977)" #31716

Revert "enhance spark_k8s_operator (#29977)" #31716

Uh oh!

Conversation

jamescnowell commented Jun 5, 2023

Uh oh!

boring-cyborg bot commented Jun 5, 2023

Uh oh!

eladkal commented Jun 5, 2023

Uh oh!

jamescnowell commented Jun 5, 2023

Uh oh!

potiuk commented Jun 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eladkal commented Jun 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamescnowell commented Jun 6, 2023

Uh oh!

potiuk commented Jun 6, 2023

Uh oh!

blcksrx commented Jun 7, 2023

Uh oh!

eladkal commented Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

potiuk commented Jun 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eladkal commented Jun 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

potiuk commented Jun 5, 2023 •

edited

Loading

eladkal commented Jun 5, 2023 •

edited

Loading

eladkal commented Jun 7, 2023 •

edited

Loading

potiuk commented Jun 7, 2023 •

edited

Loading