Log FileTaskHandler to work with multi_namespace_mode when running using KubernetesExecutor #28436

XD-DENG · 2022-12-18T02:06:58Z

The earlier FileTaskHandler is assuming the pod will always run in the default namespace specified in conf.get("kubernetes_executor", "namespace"), which is not true when we run with multi_namespace_mode.

This is a follow-up on what we have discussed earlier at #28047 (comment)

XD-DENG · 2022-12-18T06:27:08Z

The static check error is due to isort, and is being resolved by #28434

dstandish · 2022-12-18T09:15:54Z

airflow/utils/log/file_task_handler.py

looks good. i think ultimately it would probably be better to store the actual namespace on the TI record somewhere. then we don't need to check two places, and... maybe there ultimately will be other ways to configure namespace. i thought about suggesting storing it in executor config (e.g. maybe simply under a key namespace) but... probably that should be left for user configuration so i dunno. any thoughts?

Currently the user is specifying the namespace (if not default namespace in the config) in the executor config. When creating the TI, also storing the actual namespace bing used in the TI record sounds sensible to me.

One concern is that this is only applicable when KubernetesExecutor is being used. Making it too "explicit" may confuse users who always only use other executors.

Yeah that occurred to me... And I think i probably agree... couple thoughts...

For one, there is precedent. e.g. external_executor_id and queue are columns on TI which only* apply for celery executor. (with an asterisk for queue; its role was expanded a little recently for LKE and CKE). But, precedent isn't everything of course.

And, kubernetes is, I believe, the defacto way to run airlfow, and more and more it's not either k8s executor OR other executors --- with LKE and CKE you don't have to choose, k8s executor is always there as an option for you. So in this case seems like wouldn't be so crazy.

But I think for me probably more than anything is, we don't want to just add namespace because it's not very future-proof, i.e. maybe there would be more info we'd want to add, and we don't want to add a column for it every time. So probably there's a json column that wants to be there. Maybe executor_data. Maybe we would move external_executor_id there. And although it's maybe sort of tempting to expand executor_config to fill this role, probably makes sense to keep separate user config vs internal executor field.

I do agree that Kubernetes being "defacto" for Airflow (in very near future if not now). But it may not necessarily mean KubernetesExecutor going to be the "defacto" for Airflow? E.g. running CeleryExecutor (but on Kubernetes) still has certain pros over KubernetesExecutor.

The idea to use a json column sounds good to me, meanwhile a few thoughts about it:

All the database that Airflow is compatible with do support JSON column, but not all database do that (so if later we want to add compatibility with another database, this may pose constraint).

if it will mean harder to write efficient SQL/SQLAlchemy queries.

yes, true.

All the database that Airflow is compatible with do support JSON column

yeah we have solution for the "db does not support json", we just... dump to json string. we have a number of fields like this now.

if it will mean harder to write efficient SQL/SQLAlchemy queries.

yeah but that's only an issue if you are going to need to filter on that field. my assumption is, we'd only be looking at this field after we already have the TI loaded. similar to executor config, which is even worse since it's a binary blob field (pickled obj) -- but here too it's no problem since we never do queries based on that field.

anyway, no emergency to add this... just ... something we might wanna do

potiuk · 2022-12-18T15:06:13Z

The static check error is due to isort, and is being resolved by #28434

It took surprisingly long to sort that one out but I think I sorted it out (pun intended).

…ing KubernetesExecutor

…_mode (apache#28436)

XD-DENG requested a review from dstandish December 18, 2022 02:06

boring-cyborg bot added the area:logging label Dec 18, 2022

XD-DENG added the provider:cncf-kubernetes Kubernetes (k8s) provider related issues label Dec 18, 2022

XD-DENG added this to the Airflow 2.6.0 milestone Dec 18, 2022

XD-DENG force-pushed the file-task-handler-to-work-with-k8s-executor-multi-namespace-mode branch from da608d6 to 9e7b99b Compare December 18, 2022 03:34

XD-DENG requested review from ashb, kaxil and potiuk as code owners December 18, 2022 03:34

XD-DENG force-pushed the file-task-handler-to-work-with-k8s-executor-multi-namespace-mode branch from 9e7b99b to a877ff1 Compare December 18, 2022 03:37

dstandish approved these changes Dec 18, 2022

View reviewed changes

Log FileTaskHandler to work with multi_namespace_mode when running us…

259e156

…ing KubernetesExecutor

XD-DENG force-pushed the file-task-handler-to-work-with-k8s-executor-multi-namespace-mode branch from a877ff1 to 259e156 Compare December 19, 2022 02:52

XD-DENG merged commit 497c2c2 into apache:main Dec 19, 2022

XD-DENG deleted the file-task-handler-to-work-with-k8s-executor-multi-namespace-mode branch December 19, 2022 03:29

gschuurman pushed a commit to gschuurman/airflow that referenced this pull request Dec 19, 2022

Log FileTaskHandler to work with KubernetesExecutor's multi_namespace…

a2c9f48

…_mode (apache#28436)

ephraimbuddy added the type:new-feature Changelog: New Features label Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log FileTaskHandler to work with multi_namespace_mode when running using KubernetesExecutor #28436

Log FileTaskHandler to work with multi_namespace_mode when running using KubernetesExecutor #28436

Uh oh!

XD-DENG commented Dec 18, 2022

Uh oh!

XD-DENG commented Dec 18, 2022

Uh oh!

dstandish Dec 18, 2022

Uh oh!

XD-DENG Dec 19, 2022 •

edited

Loading

Uh oh!

dstandish Dec 19, 2022

Uh oh!

XD-DENG Dec 19, 2022

Uh oh!

dstandish Dec 19, 2022

Uh oh!

potiuk commented Dec 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Log FileTaskHandler to work with multi_namespace_mode when running using KubernetesExecutor #28436

Log FileTaskHandler to work with multi_namespace_mode when running using KubernetesExecutor #28436

Uh oh!

Conversation

XD-DENG commented Dec 18, 2022

Uh oh!

XD-DENG commented Dec 18, 2022

Uh oh!

dstandish Dec 18, 2022

Choose a reason for hiding this comment

Uh oh!

XD-DENG Dec 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dstandish Dec 19, 2022

Choose a reason for hiding this comment

Uh oh!

XD-DENG Dec 19, 2022

Choose a reason for hiding this comment

Uh oh!

dstandish Dec 19, 2022

Choose a reason for hiding this comment

Uh oh!

potiuk commented Dec 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

XD-DENG Dec 19, 2022 •

edited

Loading