-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Log FileTaskHandler to work with multi_namespace_mode when running using KubernetesExecutor #28436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log FileTaskHandler to work with multi_namespace_mode when running using KubernetesExecutor #28436
Conversation
da608d6 to
9e7b99b
Compare
9e7b99b to
a877ff1
Compare
|
The static check error is due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. i think ultimately it would probably be better to store the actual namespace on the TI record somewhere. then we don't need to check two places, and... maybe there ultimately will be other ways to configure namespace. i thought about suggesting storing it in executor config (e.g. maybe simply under a key namespace) but... probably that should be left for user configuration so i dunno. any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the user is specifying the namespace (if not default namespace in the config) in the executor config. When creating the TI, also storing the actual namespace bing used in the TI record sounds sensible to me.
One concern is that this is only applicable when KubernetesExecutor is being used. Making it too "explicit" may confuse users who always only use other executors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that occurred to me... And I think i probably agree... couple thoughts...
For one, there is precedent. e.g. external_executor_id and queue are columns on TI which only* apply for celery executor. (with an asterisk for queue; its role was expanded a little recently for LKE and CKE). But, precedent isn't everything of course.
And, kubernetes is, I believe, the defacto way to run airlfow, and more and more it's not either k8s executor OR other executors --- with LKE and CKE you don't have to choose, k8s executor is always there as an option for you. So in this case seems like wouldn't be so crazy.
But I think for me probably more than anything is, we don't want to just add namespace because it's not very future-proof, i.e. maybe there would be more info we'd want to add, and we don't want to add a column for it every time. So probably there's a json column that wants to be there. Maybe executor_data. Maybe we would move external_executor_id there. And although it's maybe sort of tempting to expand executor_config to fill this role, probably makes sense to keep separate user config vs internal executor field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that Kubernetes being "defacto" for Airflow (in very near future if not now). But it may not necessarily mean KubernetesExecutor going to be the "defacto" for Airflow? E.g. running CeleryExecutor (but on Kubernetes) still has certain pros over KubernetesExecutor.
The idea to use a json column sounds good to me, meanwhile a few thoughts about it:
- All the database that Airflow is compatible with do support JSON column, but not all database do that (so if later we want to add compatibility with another database, this may pose constraint).
- if it will mean harder to write efficient SQL/SQLAlchemy queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, true.
All the database that Airflow is compatible with do support JSON column
yeah we have solution for the "db does not support json", we just... dump to json string. we have a number of fields like this now.
if it will mean harder to write efficient SQL/SQLAlchemy queries.
yeah but that's only an issue if you are going to need to filter on that field. my assumption is, we'd only be looking at this field after we already have the TI loaded. similar to executor config, which is even worse since it's a binary blob field (pickled obj) -- but here too it's no problem since we never do queries based on that field.
anyway, no emergency to add this... just ... something we might wanna do
It took surprisingly long to sort that one out but I think I sorted it out (pun intended). |
…ing KubernetesExecutor
a877ff1 to
259e156
Compare
The earlier
FileTaskHandleris assuming the pod will always run in the default namespace specified inconf.get("kubernetes_executor", "namespace"), which is not true when we run withmulti_namespace_mode.This is a follow-up on what we have discussed earlier at #28047 (comment)