-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Description
Official Helm Chart version
1.5.0
Apache Airflow version
2.2.4
Kubernetes Version
1.20+
Helm Chart configuration
kerberos:
enabled: true
ccacheMountPath: /var/kerberos-ccache
ccacheFileName: cache
configPath: /etc/krb5.conf
keytabBase64Content: "<base64contentofkeytabfile>"
keytabPath: /etc/airflow.keytab
principal: <principal>
reinitFrequency: 3600
config: "|
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
[libdefaults]
default_realm = REALM1
dns_lookup_realm = false
dns_lookup_kdc = false
renew_lifetime = 7d
forwardable = true
udp_preference_limit = 1
kdc_timeout = 3000
[realms]
REALM1 = {
kdc = x
kdc = x
}
REALM2 = {
kdc = x
kdc = x
kdc = x
kdc = x
}
[domain_realm]
mapping = mapping"
Docker Image customisations
FROM apache/airflow:2.2.4
# Switch user because otherwise installing dependencies will not work
USER root
# Update package index and upgrade packages to prevent security issues
RUN apt-get update && apt-get upgrade -y
# Needed to install custom package
RUN apt-get install build-essential unixodbc-dev libkrb5-dev -y
USER airflow
RUN pip install pipenv
COPY Pipfile Pipfile.lock /
RUN pipenv install --system
What happened
When we are executing the Airflow pipeline which needs to connect to the kerberos database we are getting the following error:
File "/home/airflow/.local/lib/python3.7/site-packages/impala/dbapi.py", line 167, in connect
retries=retries)
File "/home/airflow/.local/lib/python3.7/site-packages/impala/hiveserver2.py", line 862, in connect
transport.open()
File "/home/airflow/.local/lib/python3.7/site-packages/thrift_sasl/__init__.py", line 82, in open
ret, chosen_mech, initial_response = self.sasl.start(self.mechanism)
File "/home/airflow/.local/lib/python3.7/site-packages/impala/sasl_compat.py", line 24, in start
return True, self.mechanism, self.process()
File "/home/airflow/.local/lib/python3.7/site-packages/puresasl/client.py", line 16, in wrapped
return f(self, *args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/puresasl/client.py", line 148, in process
return self._chosen_mech.process(challenge)
File "/home/airflow/.local/lib/python3.7/site-packages/puresasl/mechanisms.py", line 505, in process
kerberos.authGSSClientStep(self.context, '')
kerberos.GSSError: (('Unspecified GSS failure. Minor code may provide more information', 851968), ('Server not found in Kerberos database', -1765328377))
This is intersting because we use the exact same krb5.conf, keytab file and principal for accessing the database in another tool. I jumped into the worker-kerberos container and verified the etc/krb5.conf and the principal in the opt/airflow/airflow.cfg - they do look good. But I stumbled about the fact that I cannot find the etc/airflow.keytab file, which from what I understand should be mounted into the container based on the secret that the airflow helm chart creates. I suspect that the missing keytab file actually causes the problem as it contains the login information needed to connect to the kerberos database.
I also verified that the airflow-kerberos-keytab secret exists. It looks like this:
apiVersion: v1
kind: Secret
metadata:
name: airflow-kerberos-keytab
annotations:
meta.helm.sh/release-name: airflow
meta.helm.sh/release-namespace: namespace
type: Opaque
data:
kerberos.keytab: <base64-content>
The base64 content looks good to me.
What you think should happen instead
The airflow.keytab file should be mounted into the worker-kerberos container, which in my opinion should solve the kerberos connection error.
How to reproduce
You can reproduce this issue by using a similar kerberos configuration like the one I provided and then trying to access the database in a DAG.
Anything else
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct