-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Delete old Spark Application in SparkKubernetesOperator #21092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete old Spark Application in SparkKubernetesOperator #21092
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
|
Hi @theesen, thank you for that PR. Independent of the implementation I would offer a flag so that the user can choose what behavior he wants. |
|
Could you please add tests to the change @theesen ? |
|
Tests failing |
|
Fixed and tested the unittests locally. Should be fine now. Finger crossed. |
|
Need rebase unfortunately. |
+ KubernetesHook: adding delete_custom_object + SparkKubernetesOperator: extract name from k8 yaml and delete if exists + Update SparkKubernetesOperator docstring
+ KubernetesHook: adding delete_custom_object + SparkKubernetesOperator: extract name from k8 yaml and delete if exists + Update SparkKubernetesOperator docstring
|
@potiuk can you approve the workflow? |
|
Awesome work, congrats on your first merged pull request! |
|
@theesen , @potiuk , were the changes in this PR reverted in 9a4f674 ? I can see that with the latest airflow version (2.6.2), existing sparkapplication with the same name won't be deleted by the SparkKubernetesOperator logic, and will fail to launch the new application until manually deleting the old one. Is there any flag to bring back the auto deletion logic? |
|
No idea. But you can check the code, or with the author of the change you mention - the author might be the best to discuss the change with. |
|
Hey,
|
SparkKubernetesOperator deletes any previous Spark Application with the same name
Current issue:
If a Spark App is being launched you either have to template the name inside the yaml file (making it unique with a timestamp) or need to delete the previous run of the Spark App in order to prevent failure. Especially for newcomers to the topic this quickly leads to errors by design of the operator.
Open Questions:
I am wondering if we should delete the K8 Spark Application after a successful run as well from inside the Operator. Or at least provide a flag to enable this.
Downsides: Harder to debug since container is gone in case of errors (or we keep the container if it failed)
Upsides: Less polluted K8 Cluster with old Spark Apps.
I would love to get some input on these thoughts.
Minor Updates:
As far as I could tell the docstring was not correct. You can only pass a String or Dict as
application_file. The name of the parameter would need to be changed from my point of view as well. But since this is my first PR here, I did not want to introduce a breaking change to an Operator right away. But if someone makes sure I am doing this the right way, I am happy to adjust.closes: #16290