Skip to content

Keytab file is not mounted in worker-kerberos container #24369

@mikegit21

Description

@mikegit21

Official Helm Chart version

1.5.0

Apache Airflow version

2.2.4

Kubernetes Version

1.20+

Helm Chart configuration

kerberos:
  enabled: true
  ccacheMountPath: /var/kerberos-ccache
  ccacheFileName: cache
  configPath: /etc/krb5.conf
  keytabBase64Content: "<base64contentofkeytabfile>"
  keytabPath: /etc/airflow.keytab
  principal: <principal>
  reinitFrequency: 3600
  config: "|
    [logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log

[libdefaults]
 default_realm = REALM1
 dns_lookup_realm = false
 dns_lookup_kdc = false
 renew_lifetime = 7d
 forwardable = true
 udp_preference_limit = 1
 kdc_timeout = 3000

[realms]
 REALM1 = {
 kdc = x
 kdc = x
 }
 REALM2 = {
 kdc = x
 kdc = x
 kdc = x
 kdc = x
 }

[domain_realm]
mapping = mapping"

Docker Image customisations

FROM apache/airflow:2.2.4

# Switch user because otherwise installing dependencies will not work
USER root
# Update package index and upgrade packages to prevent security issues
RUN apt-get update && apt-get upgrade -y
# Needed to install custom package
RUN apt-get install build-essential unixodbc-dev libkrb5-dev -y
USER airflow

RUN pip install pipenv

COPY Pipfile Pipfile.lock /
RUN pipenv install --system

What happened

When we are executing the Airflow pipeline which needs to connect to the kerberos database we are getting the following error:

File "/home/airflow/.local/lib/python3.7/site-packages/impala/dbapi.py", line 167, in connect
    retries=retries)
  File "/home/airflow/.local/lib/python3.7/site-packages/impala/hiveserver2.py", line 862, in connect
    transport.open()
  File "/home/airflow/.local/lib/python3.7/site-packages/thrift_sasl/__init__.py", line 82, in open
    ret, chosen_mech, initial_response = self.sasl.start(self.mechanism)
  File "/home/airflow/.local/lib/python3.7/site-packages/impala/sasl_compat.py", line 24, in start
    return True, self.mechanism, self.process()
  File "/home/airflow/.local/lib/python3.7/site-packages/puresasl/client.py", line 16, in wrapped
    return f(self, *args, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/puresasl/client.py", line 148, in process
    return self._chosen_mech.process(challenge)
  File "/home/airflow/.local/lib/python3.7/site-packages/puresasl/mechanisms.py", line 505, in process
    kerberos.authGSSClientStep(self.context, '')
kerberos.GSSError: (('Unspecified GSS failure.  Minor code may provide more information', 851968), ('Server not found in Kerberos database', -1765328377))

This is intersting because we use the exact same krb5.conf, keytab file and principal for accessing the database in another tool. I jumped into the worker-kerberos container and verified the etc/krb5.conf and the principal in the opt/airflow/airflow.cfg - they do look good. But I stumbled about the fact that I cannot find the etc/airflow.keytab file, which from what I understand should be mounted into the container based on the secret that the airflow helm chart creates. I suspect that the missing keytab file actually causes the problem as it contains the login information needed to connect to the kerberos database.

I also verified that the airflow-kerberos-keytab secret exists. It looks like this:

apiVersion: v1
kind: Secret
metadata:
  name: airflow-kerberos-keytab
 annotations:
   meta.helm.sh/release-name: airflow
   meta.helm.sh/release-namespace: namespace
type: Opaque
data:
  kerberos.keytab: <base64-content>

The base64 content looks good to me.

What you think should happen instead

The airflow.keytab file should be mounted into the worker-kerberos container, which in my opinion should solve the kerberos connection error.

How to reproduce

You can reproduce this issue by using a similar kerberos configuration like the one I provided and then trying to access the database in a DAG.

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions