Skip to content

Conversation

@jamescnowell
Copy link

This reverts commit 9a4f674.

Based on the discussion found
here, previous changes
to the Spark K8s Operator broke existing functionality and did not update the documentation for the newly enabled functionality.

The Spark Sensor no longer works, XCOM no longer works on the Operator itself, and the Operator does not fail when the Spark job fails.

Rather than attempt to fix or resolve the current implementation, I am reverting to the existing, documented implementation.

I would propose creating a new Operator with alternative functionality (one which does not need a Sensor, copies logs, etc.) if that is desired.

closes: #31183

This reverts commit 9a4f674.

Based on the discussion found
[here](apache#31183), previous
changes
to the Spark K8s Operator broke existing functionality and did not
update the documentation for the newly enabled functionality.

The Spark Sensor no longer works, XCOM no longer works on the Operator
itself, and the Operator does not fail when the Spark job fails.

Rather than attempt to fix or resolve the current implementation,
I am reverting to the existing, documented implementation.

I would propose creating a _new_ Operator with alternative
functionality (one which does not need a Sensor, copies logs, etc.)
if that is desired.
@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes (k8s) provider related issues area:providers provider:Apache labels Jun 5, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 5, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@eladkal
Copy link
Contributor

eladkal commented Jun 5, 2023

I don't think revert is required?
This sounds more like a case where additional functionality + docs are needed and once we have it and release new version we can (maybe) yank the old one.

The PR in question was released in April. There may be users out there who relay on it and reverting it would cause their code to break.

@jamescnowell
Copy link
Author

additional functionality
As it is currently, it is a non-backwards compatible change in the functionality of the Operator, which previously relied entirely on the Sensor for monitoring the job status (as far as I know). Currently, the Operator maintains a Running status for the duration of the Spark job, which changes the possible DAG configurations.

I have an immediate need to have a working DAG, meaning I need to revert to an older version of the Operator (as this version does not fail when a job fails). On top of that, if the change moves forwards as-is, I will not only need to update my DAGs in the future, but will also lose features I may be relying on (the ability to have a fork in a DAG after launching a Spark job, but before it's finished).

I'm definitely for an Operator that behaves in this way, but either by adding flags to the existing operator, or creating an Operator with a different name, and a deprecation schedule for the existing Operator.

@potiuk
Copy link
Member

potiuk commented Jun 5, 2023

I'm definitely for an Operator that behaves in this way, but either by adding flags to the existing operator, or creating an Operator with a different name, and a deprecation schedule for the existing Operator.

Can you do it this way rather than reverting the change?

While I understand your case is impacted, it can be easily fixed by downgrading to earlier version of provider, so "properly" fixing it seems to be a better way than reverting. Reverting a change that has already been released only adds confusion, because if somone relies on what has been already released then you break the other's workflows while you are fixing yours. Can you add a flag to control the behaviour instead?

@eladkal
Copy link
Contributor

eladkal commented Jun 5, 2023

I agree with @potiuk .
Your immidiate issue can be solved by downgrading the version of the provider till the underlying issue is resolved.

The fact that you agree and support the code modifications made by previous PR makes even stronger case not to revert.
Can you please modify the PR to adress the concerns you raised (docs etc..)

@jamescnowell
Copy link
Author

I'm not able to spend that much time on changes like that right now. You can close this PR if it's not something you plan on accepting.

I find it frustrating that non-backwards compatible changes are allowed into to "stable" API versions, as will many others.

For the "Some people may be relying on it" statement... while that is possible, the current version is fundamentally broken, so those people will have jobs that fail with no notification. I believe most of those people will have ended up finding #31183 as using the code provided in the examples errors out very quickly.

I understand this is already a sticky situation and there are no good solutions, this is simply the solution that I had time for, that appeared to be in line with having stable API version numbers.

@potiuk
Copy link
Member

potiuk commented Jun 6, 2023

then maybe @blcksrx will have time to fix it since that was the #31183 change that broke it. Do you agree @blcksrx that this is "fundamentally broken" ?

Can you please fix it @blcksrx ? I think if not then indeed we will have to rever it, unless someone provides a fix.

I find it frustrating that non-backwards compatible changes are allowed into to "stable" API versions, as will many others.

Surely @jamescnowell . It is frustrating for everyone. But maybe you do not understand the perspective and OSS spirit - it might happen. Maybe you could better do it with 80 providers, 5000 or so operators and 30 commits merged a day and multiple thousands of tests to prevent such errors from happening. Airflow has 2500 contributors and you get it for absolutely free. There are many ways you and others can give back - spending time (as the 2500 contributors) is one of the ways - and I understand you do not have more of the time (but then maybe github Sponsorship or finding someone in your company - that uses the software for free) would be a good way to give back for the countless hours people spend on contributing to Airflow so that your company can use it for free.

@blcksrx
Copy link
Contributor

blcksrx commented Jun 7, 2023

@potiuk
I'm not completely agree this version is broken. It's could be fixed in no time!

@eladkal
Copy link
Contributor

eladkal commented Jun 7, 2023

I find it frustrating that non-backwards compatible changes are allowed into to "stable" API versions, as will many others

This was not intentional.
We have several check points to minimize such issues but its not bullet proof. I welcome you to participate in testing releases to minimize such future cases.

The issue here is that segnificant time has passed since the release and reverting / yanking the version might cause more harm than good as there may be users that already used the modified version.

Your issue is solved by simply not updating the provider till a fix is merged.

@potiuk
Copy link
Member

potiuk commented Jun 7, 2023

@potiuk I'm not completely agree this version is broken. It's could be fixed in no time!

Can you make a fix then, please ?

@eladkal
Copy link
Contributor

eladkal commented Jun 8, 2023

Closing in favor of #31798
@jamescnowell I welcome you to assist with the review

@eladkal eladkal closed this Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SparkKubernetesSensor: 'None' has no attribute 'metadata'

4 participants