Skip to content

Conversation

@XD-DENG
Copy link
Member

@XD-DENG XD-DENG commented Dec 18, 2022

The earlier FileTaskHandler is assuming the pod will always run in the default namespace specified in conf.get("kubernetes_executor", "namespace"), which is not true when we run with multi_namespace_mode.

This is a follow-up on what we have discussed earlier at #28047 (comment)

@XD-DENG XD-DENG requested a review from dstandish December 18, 2022 02:06
@XD-DENG XD-DENG added the provider:cncf-kubernetes Kubernetes (k8s) provider related issues label Dec 18, 2022
@XD-DENG XD-DENG added this to the Airflow 2.6.0 milestone Dec 18, 2022
@XD-DENG XD-DENG force-pushed the file-task-handler-to-work-with-k8s-executor-multi-namespace-mode branch from da608d6 to 9e7b99b Compare December 18, 2022 03:34
@XD-DENG XD-DENG force-pushed the file-task-handler-to-work-with-k8s-executor-multi-namespace-mode branch from 9e7b99b to a877ff1 Compare December 18, 2022 03:37
@XD-DENG
Copy link
Member Author

XD-DENG commented Dec 18, 2022

The static check error is due to isort, and is being resolved by #28434

Comment on lines +194 to +198
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. i think ultimately it would probably be better to store the actual namespace on the TI record somewhere. then we don't need to check two places, and... maybe there ultimately will be other ways to configure namespace. i thought about suggesting storing it in executor config (e.g. maybe simply under a key namespace) but... probably that should be left for user configuration so i dunno. any thoughts?

Copy link
Member Author

@XD-DENG XD-DENG Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the user is specifying the namespace (if not default namespace in the config) in the executor config. When creating the TI, also storing the actual namespace bing used in the TI record sounds sensible to me.

One concern is that this is only applicable when KubernetesExecutor is being used. Making it too "explicit" may confuse users who always only use other executors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that occurred to me... And I think i probably agree... couple thoughts...

For one, there is precedent. e.g. external_executor_id and queue are columns on TI which only* apply for celery executor. (with an asterisk for queue; its role was expanded a little recently for LKE and CKE). But, precedent isn't everything of course.

And, kubernetes is, I believe, the defacto way to run airlfow, and more and more it's not either k8s executor OR other executors --- with LKE and CKE you don't have to choose, k8s executor is always there as an option for you. So in this case seems like wouldn't be so crazy.

But I think for me probably more than anything is, we don't want to just add namespace because it's not very future-proof, i.e. maybe there would be more info we'd want to add, and we don't want to add a column for it every time. So probably there's a json column that wants to be there. Maybe executor_data. Maybe we would move external_executor_id there. And although it's maybe sort of tempting to expand executor_config to fill this role, probably makes sense to keep separate user config vs internal executor field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that Kubernetes being "defacto" for Airflow (in very near future if not now). But it may not necessarily mean KubernetesExecutor going to be the "defacto" for Airflow? E.g. running CeleryExecutor (but on Kubernetes) still has certain pros over KubernetesExecutor.

The idea to use a json column sounds good to me, meanwhile a few thoughts about it:

  • All the database that Airflow is compatible with do support JSON column, but not all database do that (so if later we want to add compatibility with another database, this may pose constraint).
  • if it will mean harder to write efficient SQL/SQLAlchemy queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, true.

All the database that Airflow is compatible with do support JSON column

yeah we have solution for the "db does not support json", we just... dump to json string. we have a number of fields like this now.

if it will mean harder to write efficient SQL/SQLAlchemy queries.

yeah but that's only an issue if you are going to need to filter on that field. my assumption is, we'd only be looking at this field after we already have the TI loaded. similar to executor config, which is even worse since it's a binary blob field (pickled obj) -- but here too it's no problem since we never do queries based on that field.

anyway, no emergency to add this... just ... something we might wanna do

@potiuk
Copy link
Member

potiuk commented Dec 18, 2022

The static check error is due to isort, and is being resolved by #28434

It took surprisingly long to sort that one out but I think I sorted it out (pun intended).

@XD-DENG XD-DENG force-pushed the file-task-handler-to-work-with-k8s-executor-multi-namespace-mode branch from a877ff1 to 259e156 Compare December 19, 2022 02:52
@XD-DENG XD-DENG merged commit 497c2c2 into apache:main Dec 19, 2022
@XD-DENG XD-DENG deleted the file-task-handler-to-work-with-k8s-executor-multi-namespace-mode branch December 19, 2022 03:29
gschuurman pushed a commit to gschuurman/airflow that referenced this pull request Dec 19, 2022
@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging provider:cncf-kubernetes Kubernetes (k8s) provider related issues type:new-feature Changelog: New Features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants