Skip to content

Conversation

@juhai
Copy link

@juhai juhai commented Jun 30, 2023

This PR provides support for creating unique but sensible names to k8s pods based on the given name and ti.job_id. The KubernetesPodOperator supports adding random suffix but that isn't suitable for our use case. The random suffix can't be templated and isn't available via DAG context.

The background for this is the need to run Spark tasks in k8s. In order to use the preferred client mode, the pod name needs to be placed in spark configuration so that driver-executor communication can take place via a headless k8s service. This requires that the name of the (driver) pod is unique and known at the time the pod is created. The ti.job_id seems to be a good way to make pod names unique and is available via pod context when pod is created.

The change adds a new argument job_id_as_suffix to the KubernetesPodOperator class with backwards compatibility with random_suffix. In case job_id_as_suffix=True and random_name_suffix=False, the ti.job_id value from context will be appended to the pod name, separated by -. The same value can then be provided to spark configuration via Airflow templating to use as the driver hostname for the headless service.

@juhai juhai requested a review from jedcunningham as a code owner June 30, 2023 08:19
@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes (k8s) provider related issues area:providers labels Jun 30, 2023
Comment on lines 161 to 162
Copy link
Contributor

@eladkal eladkal Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be very confusing.
If we want this need we need to do something similar to #30718
which means deprecate random_name_suffix and add name_suffix with options of (None, "random", "job_id")

On a more high level note I don't think this will work as you expect?
as noted in random_name_suffix the chars allowed are [a-z0-9.-] so what will happen when task_id contains illegal chars? I suspect that your goal is to match the task_id of airflow to the pod name for easier find but when it's not exact match and you start playing around with modifying the names then it defeat the purpose

WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I agree with your proposal to change the suffixing logic instead. I don't know how the deprecation process would work for this, though.

This is using ti.job_id which I think is assigned by airflow, task_id is separate and can be supplied in KubernetesPodOperator. We shouldn't use that for the reason you state. I have not seen a way to modify the job_id.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added another proposal which makes the following changes

  • Use name_suffix instead of two arguments to control the suffix operation as suggested
  • Make hostname suffix modification optional via add_suffix_to_hostname. I am not aware of situations where hostname could/should be different from pod name but the random suffix has not been applied to hostname before.
  • Change default of random_suffix to False and notify about deprecation if it's set to True.
  • Clean _create_pod_id as some options were never given. Set the full pod name with suffix inside to take account both random and job_id based suffix.

Copy link
Contributor

@eladkal eladkal Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consulted with @jedcunningham also on this one.

Can you clarify why pod_mutation_hook and/or adding the name parameter to templated_field is insufficient ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for the feedback and the suggestions. I think the pod_mutation_field could work but need to make sure the pod name stays within the length limit. The templated_field option sounds fine too but requires a change in the op to skip the name validity checking in the constructor. Do you think this is worth pursuing, I could prepare a PR for that?

I found pre_execute to be most suitable for my purposes after all. Since the context of the job to be launched is available I can override the name before execute is called. I will close the PR for now.

@juhai juhai force-pushed the pod-name-suffix-with-job-id branch from 404260d to ef0e278 Compare July 4, 2023 13:17
@juhai juhai force-pushed the pod-name-suffix-with-job-id branch from ef0e278 to 5c1124a Compare July 4, 2023 14:51
@juhai
Copy link
Author

juhai commented Jul 17, 2023

Closing as the current proposal isn't required to achieve what I need

@juhai juhai closed this Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants