Skip to content

Conversation

@snjypl
Copy link
Contributor

@snjypl snjypl commented Jan 18, 2023

we see the fellowing error in the webserver log on trying to run the task manually from the UI for k8s executor.

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/executors/kubernetes_executor.py", line 264, in run_pod_async
    body=sanitized_pod, namespace=pod.metadata.namespace, **kwargs
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 7356, in create_namespaced_pod
    return self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)  # noqa: E501
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 7469, in create_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 397, in request
    body=body)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 281, in POST
    body=body)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 234, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '5a631062-cbc1-49f6-ae20-3581626a6249', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'e6bfeec0-ef68-4420-88d8-b7024eed9fea', 'X-Kubernetes-Pf-Prioritylevel-Uid': '0e598fd5-a817-46f7-be85-6ac94aeb7872', 'Date': 'Wed, 18 Jan 2023 10:52:05 GMT', 'Content-Length': '381'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:airflow:airflow-webserver\" cannot create resource \"pods\" in API group \"\" in the namespace \"airflow\": RBAC: clusterrole.rbac.authorization.k8s.io \"system:openshift:scc:anyuid\" not found","reason":"Forbidden","details":{"kind":"pods"},"code":403}

in case of k8s executor, the webserver role should be able to launch the pod for manual task trigger. we need to add the airflow-webserver service account to pod-launcher role.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added the area:helm-chart Airflow Helm Chart label Jan 18, 2023
@snjypl snjypl force-pushed the bugfix/k8s-executor-manual-trigger-helm-chart-rbac-error branch 3 times, most recently from be3cf47 to 4cef0f7 Compare January 18, 2023 14:25
@arjunanan6
Copy link
Contributor

Alright, so I attempted this locally where I have no restrictions. As discussed on #28394, scheduled tasks run just fine, but there is an exception when running a task manually:

[2023-01-20T11:57:44.263+0000] {kubernetes_executor.py:527} INFO - Start Kubernetes executor
[2023-01-20T11:57:44.306+0000] {kubernetes_executor.py:476} INFO - Found 0 queued task instances
[2023-01-20T11:57:44.309+0000] {base_executor.py:95} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'HELLO_WORLD', 'hello', 'scheduled__2023-01-20T11:40:00+00:00', '--ignore-all-dependencies', '--ignore-dependencies', '--force', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/hello.py']
[2023-01-20T11:57:44.310+0000] {kubernetes_executor.py:559} INFO - Add task TaskInstanceKey(dag_id='HELLO_WORLD', task_id='hello', run_id='scheduled__2023-01-20T11:40:00+00:00', try_number=4, map_index=-1) with command ['airflow', 'tasks', 'run', 'HELLO_WORLD', 'hello', 'scheduled__2023-01-20T11:40:00+00:00', '--ignore-all-dependencies', '--ignore-dependencies', '--force', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/hello.py']
[2023-01-20T11:57:44.310+0000] {kubernetes_executor.py:130} INFO - Event: and now my watch begins starting at resource_version: 0
[2023-01-20T11:57:44.383+0000] {kubernetes_executor.py:339} INFO - Creating kubernetes pod for job is TaskInstanceKey(dag_id='HELLO_WORLD', task_id='hello', run_id='scheduled__2023-01-20T11:40:00+00:00', try_number=2, map_index=-1), with pod name hello-world-hello-be2bad2bd8dc4568bd1ba73082ecef4a
[2023-01-20T11:57:44.392+0000] {kubernetes_executor.py:274} ERROR - Exception when attempting to create Namespaced Pod: {
  "apiVersion": "v1",
  "kind": "Pod",
  "metadata": {
    "annotations": {
      "dag_id": "HELLO_WORLD",
      "task_id": "hello",
      "try_number": "2",
      "run_id": "scheduled__2023-01-20T11:40:00+00:00"
    },
    "labels": {
      "tier": "airflow",
      "component": "worker",
      "release": "airflowlocal",
      "airflow-worker": "None",
      "dag_id": "HELLO_WORLD",
      "task_id": "hello",
      "try_number": "2",
      "airflow_version": "2.5.0",
      "kubernetes_executor": "True",
      "run_id": "scheduled__2023-01-20T1140000000-c15690dab"
    },
    "name": "hello-world-hello-be2bad2bd8dc4568bd1ba73082ecef4a",
    "namespace": "airflow"
  },
  "spec": {
    "affinity": {},
    "containers": [
      {
        "args": [
          "airflow",
          "tasks",
          "run",
          "HELLO_WORLD",
          "hello",
          "scheduled__2023-01-20T11:40:00+00:00",
          "--ignore-all-dependencies",
          "--ignore-dependencies",
          "--force",
          "--local",
          "--pool",
          "default_pool",
          "--subdir",
          "DAGS_FOLDER/hello.py"
        ],
        "env": [
          {
            "name": "AIRFLOW__CORE__EXECUTOR",
            "value": "LocalExecutor"
          },
          {
            "name": "AIRFLOW__CORE__FERNET_KEY",
            "valueFrom": {
              "secretKeyRef": {
                "key": "fernet-key",
                "name": "airflowlocal-fernet-key"
              }
            }
          },
          {
            "name": "AIRFLOW__CORE__SQL_ALCHEMY_CONN",
            "valueFrom": {
              "secretKeyRef": {
                "key": "connection",
                "name": "airflowlocal-airflow-metadata"
              }
            }
          },
          {
            "name": "AIRFLOW__DATABASE__SQL_ALCHEMY_CONN",
            "valueFrom": {
              "secretKeyRef": {
                "key": "connection",
                "name": "airflowlocal-airflow-metadata"
              }
            }
          },
          {
            "name": "AIRFLOW_CONN_AIRFLOW_DB",
            "valueFrom": {
              "secretKeyRef": {
                "key": "connection",
                "name": "airflowlocal-airflow-metadata"
              }
            }
          },
          {
            "name": "AIRFLOW__WEBSERVER__SECRET_KEY",
            "valueFrom": {
              "secretKeyRef": {
                "key": "webserver-secret-key",
                "name": "airflowlocal-webserver-secret-key"
              }
            }
          },
          {
            "name": "AIRFLOW_IS_K8S_EXECUTOR_POD",
            "value": "True"
          }
        ],
        "image": "my-dags:0.0.1",
        "imagePullPolicy": "IfNotPresent",
        "name": "base",
        "resources": {},
        "volumeMounts": [
          {
            "mountPath": "/opt/airflow/logs",
            "name": "logs"
          },
          {
            "mountPath": "/opt/airflow/airflow.cfg",
            "name": "config",
            "readOnly": true,
            "subPath": "airflow.cfg"
          },
          {
            "mountPath": "/opt/airflow/config/airflow_local_settings.py",
            "name": "config",
            "readOnly": true,
            "subPath": "airflow_local_settings.py"
          }
        ]
      }
    ],
    "restartPolicy": "Never",
    "securityContext": {
      "fsGroup": 0,
      "runAsUser": 50000
    },
    "serviceAccountName": "airflowlocal-worker",
    "volumes": [
      {
        "emptyDir": {},
        "name": "logs"
      },
      {
        "configMap": {
          "name": "airflowlocal-airflow-config"
        },
        "name": "config"
      }
    ]
  }
}
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/executors/kubernetes_executor.py", line 269, in run_pod_async
    resp = self.kube_client.create_namespaced_pod(
  File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 7356, in create_namespaced_pod
    return self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)  # noqa: E501
  File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py", line 7455, in create_namespaced_pod_with_http_info
    return self.api_client.call_api(
  File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 348, in call_api
    return self.__call_api(resource_path, method,
  File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
    response_data = self.request(
  File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py", line 391, in request
    return self.rest_client.POST(url,
  File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 275, in POST
    return self.request("POST", url,
  File "/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/rest.py", line 234, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'c2225504-16da-4966-ae42-36241a0d49cb', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '3dd6f880-05b3-46ee-a0f2-a45a4c663a50', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'b8f0d7a8-7a33-4a3e-8e49-112cfe10ad1d', 'Date': 'Fri, 20 Jan 2023 11:57:44 GMT', 'Content-Length': '299'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:airflow:airflowlocal-webserver\" cannot create resource \"pods\" in API group \"\" in the namespace \"airflow\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

Which is strange, since this is the same service account that is spinning up pods to execute a scheduled task, here is a yaml of that pod definition:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    dag_id: HELLO_WORLD
    run_id: scheduled__2023-01-20T11:40:00+00:00
    task_id: hello
    try_number: "3"
  creationTimestamp: "2023-01-20T11:57:11Z"
  labels:
    airflow-worker: "80"
    airflow_version: 2.5.0
    component: worker
    dag_id: HELLO_WORLD
    kubernetes_executor: "True"
    release: airflowlocal
    run_id: scheduled__2023-01-20T1140000000-c15690dab
    task_id: hello
    tier: airflow
    try_number: "3"
  name: hello-world-hello-3c579ea84688467bab7036d3bc940c64
  namespace: airflow
  resourceVersion: "40588"
  uid: 53017417-d794-4532-bdb8-fa92db4f97fd
spec:
  affinity: {}
  containers:
  - args:
    - airflow
    - tasks
    - run
    - HELLO_WORLD
    - hello
    - scheduled__2023-01-20T11:40:00+00:00
    - --local
    - --subdir
    - DAGS_FOLDER/hello.py
    env:
    - name: AIRFLOW__CORE__EXECUTOR
      value: LocalExecutor
    - name: AIRFLOW__CORE__FERNET_KEY
      valueFrom:
        secretKeyRef:
          key: fernet-key
          name: airflowlocal-fernet-key
    - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
      valueFrom:
        secretKeyRef:
          key: connection
          name: airflowlocal-airflow-metadata
    - name: AIRFLOW__DATABASE__SQL_ALCHEMY_CONN
      valueFrom:
        secretKeyRef:
          key: connection
          name: airflowlocal-airflow-metadata
    - name: AIRFLOW_CONN_AIRFLOW_DB
      valueFrom:
        secretKeyRef:
          key: connection
          name: airflowlocal-airflow-metadata
    - name: AIRFLOW__WEBSERVER__SECRET_KEY
      valueFrom:
        secretKeyRef:
          key: webserver-secret-key
          name: airflowlocal-webserver-secret-key
    - name: AIRFLOW_IS_K8S_EXECUTOR_POD
      value: "True"
    image: my-dags:0.0.1
    imagePullPolicy: IfNotPresent
    name: base
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /opt/airflow/logs
      name: logs
    - mountPath: /opt/airflow/airflow.cfg
      name: config
      readOnly: true
      subPath: airflow.cfg
    - mountPath: /opt/airflow/config/airflow_local_settings.py
      name: config
      readOnly: true
      subPath: airflow_local_settings.py
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-w5gcp
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: kind-control-plane
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 0
    runAsUser: 50000
  serviceAccount: airflowlocal-worker
  serviceAccountName: airflowlocal-worker
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - emptyDir: {}
    name: logs
  - configMap:
      defaultMode: 420
      name: airflowlocal-airflow-config
    name: config
  - name: kube-api-access-w5gcp
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-01-20T11:57:11Z"
    reason: PodCompleted
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-01-20T11:57:20Z"
    reason: PodCompleted
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-01-20T11:57:20Z"
    reason: PodCompleted
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-01-20T11:57:11Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://a81155f6103c7fee21ab8b06298d8ba04a112f91356685f2d6414dd68136eb3b
    image: docker.io/library/my-dags:0.0.1
    imageID: docker.io/library/import-2023-01-20@sha256:9298bf3504bc8279b2270f7d41e9d2c1244a39e22e0ac0c534c5e881d5621ca9
    lastState: {}
    name: base
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://a81155f6103c7fee21ab8b06298d8ba04a112f91356685f2d6414dd68136eb3b
        exitCode: 0
        finishedAt: "2023-01-20T11:57:19Z"
        reason: Completed
        startedAt: "2023-01-20T11:57:11Z"
  hostIP: 172.18.0.2
  phase: Running
  podIP: 10.244.0.42
  podIPs:
  - ip: 10.244.0.42
  qosClass: BestEffort
  startTime: "2023-01-20T11:57:11Z"

I verified whether this SA is able to spin up a pod, and that checks out too:

kubectl auth can-i get pods --as system:serviceaccount:airflow:airflowlocal-webserver
yes

@snjypl
Copy link
Contributor Author

snjypl commented Jan 20, 2023

@arjunanan6 please share the output of kubectl get rolebinding airflow-pod-launcher-rolebinding -o yaml

@arjunanan6
Copy link
Contributor

@snjypl Sure, here you go:

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  annotations:
    meta.helm.sh/release-name: airflowlocal
    meta.helm.sh/release-namespace: airflow
  creationTimestamp: "2023-01-18T09:52:13Z"
  labels:
    app.kubernetes.io/managed-by: Helm
    chart: airflow-1.7.0
    heritage: Helm
    release: airflowlocal
    tier: airflow
  name: airflowlocal-pod-launcher-rolebinding
  namespace: airflow
  resourceVersion: "13279"
  uid: 05617ca4-ebf3-424a-990b-8b59274702f0
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: airflowlocal-pod-launcher-role
subjects:
- kind: ServiceAccount
  name: airflowlocal-scheduler
  namespace: airflow
- kind: ServiceAccount
  name: airflowlocal-worker
  namespace: airflow

@snjypl
Copy link
Contributor Author

snjypl commented Jan 20, 2023

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
annotations:
meta.helm.sh/release-name: airflowlocal
meta.helm.sh/release-namespace: airflow
creationTimestamp: "2023-01-18T09:52:13Z"
labels:
app.kubernetes.io/managed-by: Helm
chart: airflow-1.7.0
heritage: Helm
release: airflowlocal
tier: airflow
name: airflowlocal-pod-launcher-rolebinding
namespace: airflow
resourceVersion: "13279"
uid: 05617ca4-ebf3-424a-990b-8b59274702f0
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: airflowlocal-pod-launcher-role
subjects:

  • kind: ServiceAccount
    name: airflowlocal-scheduler
    namespace: airflow
  • kind: ServiceAccount
    name: airflowlocal-worker
    namespace: airflow

@arjunanan6 it seems like you have not applied the patch in this PR correctly,
as you can see airflowlocal-webserver service account does not have pod laucher role. it is not in the list of subjects.

you can try to manually apply the bellow gist
https://gist.github.com/snjypl/aeebe582be0190e483163224f9c966e7

reason why the scheduled task are not having issue is, the scheduled task pod are launched by the scheduler. the scheduler sa has pod launcher role.

in case of manual trigger, the k8s pod is launched by the webserver.

how are you deploying the helm chart from this PR? if you can share the steps i might be able to help you with it.
anyway, manually doing a kubectl apply -f <the gist file> should fix it for you. later you can try testing the helm chart.

@snjypl
Copy link
Contributor Author

snjypl commented Jan 20, 2023

@arjunanan6 also,
you should check
kubectl auth can-i create pods --as system:serviceaccount:airflow:airflowlocal-webserver

@snjypl snjypl force-pushed the bugfix/k8s-executor-manual-trigger-helm-chart-rbac-error branch 2 times, most recently from f6a4c33 to ce886c7 Compare January 20, 2023 20:43
@arjunanan6
Copy link
Contributor

@snjypl Yeah, you're right. Not sure what happened there, but now the patch has been added in properly and I can confirm that the the change works. Running a task individually works. :)

@potiuk
Copy link
Member

potiuk commented Jan 22, 2023

LGTM. But would love @jedcunningham or @dstandish to verify it.

@snjypl snjypl force-pushed the bugfix/k8s-executor-manual-trigger-helm-chart-rbac-error branch from ce886c7 to 6648f1c Compare January 24, 2023 11:52
Copy link
Member

@jedcunningham jedcunningham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this PR does fix the chart, I can't argue there.

However, I'd never actually looked at the manual run code, and it turns out the webserver endpoint kicks off an executor and abandons it! KE pods don't even get cleaned up by it, that falls on a scheduler that ends up adopting the pod. That's the happy path! If you have something happen that causes your worker to not be successful, it may not ever be adopted?

So, in short, I'm a little hesitant to "fix" this chart issue when the underlying feature it relies on is itself flawed. Meaning, I think it's correct in the big picture that the webserver isn't able to launch pods. I'd rather explore fixing the manual run behavior first.

},
show_only=["templates/rbac/pod-launcher-rolebinding.yaml"],
)
print(docs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(docs)

@potiuk
Copy link
Member

potiuk commented Feb 19, 2023

So @snjypl are you going to work on it as explained by @jedcunningham ? Or maybe you would like to create theissue to fix the manual behaviour if you are not sure how to fix it?

@github-actions
Copy link

github-actions bot commented Apr 6, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Apr 6, 2023
@github-actions github-actions bot closed this Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:helm-chart Airflow Helm Chart stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants