Skip to content

SparkKubernetesOperator fails after upgrade from 2.8.1 to 2.8.2 #38017

@gbloisi-openaire

Description

@gbloisi-openaire

Apache Airflow version

2.8.2

If "Other Airflow 2 version" selected, which one?

2.8.3rc1

What happened?

I'm running a spark-pi example using the SparkKubernetesOperator:

    task_id='spark_pi_submit',
    namespace='lot1-spark-jobs',
    application_file="/example_spark_kubernetes_operator_pi.yaml",
    kubernetes_conn_id="kubernetes_default",
    do_xcom_push=True,
    in_cluster=True,
    delete_on_termination=True,
    dag=dag
)

It was running fine on 2.8.1. After upgrading to airflow 2.8.2 I got the following error:

│     kube_client=self.client,                                                                                                                                                              │
│                 ^^^^^^^^^^^                                                                                                                                                               │
│   File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__                                                                                                                    │
│     val = self.func(instance)                                                                                                                                                             │
│           ^^^^^^^^^^^^^^^^^^^                                                                                                                                                             │
│   File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py", line 250, in client                                           │
│     return self.hook.core_v1_client                                                                                                                                                       │
│            ^^^^^^^^^                                                                                                                                                                      │
│   File "/usr/local/lib/python3.11/functools.py", line 1001, in __get__                                                                                                                    │
│     val = self.func(instance)                                                                                                                                                             │
│           ^^^^^^^^^^^^^^^^^^^                                                                                                                                                             │
│   File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py", line 242, in hook                                             │
│     or self.template_body.get("kubernetes", {}).get("kube_config_file", None),                                                                                                            │
│        ^^^^^^^^^^^^^^^^^^                                                                                                                                                                 │
│   File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py", line 198, in template_body                                    │
│     return self.manage_template_specs()                                                                                                                                                   │
│            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                   │
│   File "/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/spark_kubernetes.py", line 127, in manage_template_specs                            │
│     template_body = _load_body_to_dict(open(self.application_file))                                                                                                                       │
│                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                        │
│ FileNotFoundError: [Errno 2] No such file or directory: 'apiVersion: "sparkoperator.k8s.io/v1beta2"\nkind: SparkApplication\nmetadata:\n  name: spark-pi\n  namespace: lot1-spark-jobs\ns │
│ [2024-03-10T10:29:15.613+0000] {taskinstance.py:1149} INFO - Marking task as UP_FOR_RETRY. dag_id=spark_pi, task_id=spark_pi_submit, execution_date=20240310T102910, start_date=20240310T │

It looks like self.application_file eventually contains the content of the file it point to.

I suspect it was caused by changes introduced by PR-22253. I'm quite new to Airflow and Python but my guess is that "application_file" property hasn't to be managed as a template_property since template representations where moved to template_body.

What you think should happen instead?

No response

How to reproduce

Given my understanding of the issue, a very simple example of SparkKubernetesOperator using application_file property should reproduce this issue.

Operating System

kind kubernetes

Versions of Apache Airflow Providers

No response

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions