Skip to content

Conversation

@joshuayeung
Copy link
Contributor

This PR added IstioKubernetesPodOperator and IstioPodManager for making KubernetesPodOperator works in Kubernetes with Istio enabled.

Istio will inject a sidecar called istio-proxy inside your pod and handle the traffic for you. But this hinders the standard workflow of the Kubernetes executor and Kubernetes Pod Operator. It is because both of them use the Pod Phase to determine the status of the dag task. Airflow will consider that the task is finished when the Pod Phase is successful and failed. Because of the istio-proxy sidecar, the pod will always be running. Just the task container itself is completed. This will lead to the Pod Phase become Not Ready in the success task, and Error in the failed task. In Airflow UI, you will see the task always running and never-ending.

The best way is to implement an Istio Kubernetes Pod Operator that shut down the pod when the base container (the container that runs your task) is completed. It involves creating an Istio pod launcher.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes (k8s) provider related issues area:providers labels Sep 28, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 28, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@sfc-gh-atatara
Copy link

Thank you for that change, our initial test proved it working

@ashb
Copy link
Member

ashb commented Sep 28, 2022

I don't think this should be a new operator!

@potiuk
Copy link
Member

potiuk commented Sep 28, 2022

I don't think this should be a new operator!

Agree. I think it should be merged into KPO (istio=True). It's logic might be separated into classes but it should be one module and operator to use by the users.

@dstandish
Copy link
Contributor

I don't think this should be a new operator!

was just gonna say the same thnig

@eladkal eladkal changed the title Added IstioKubernetesPodOperator for making KubernetesPodOperator works in Kubernetes with Istio enabled Added Istio support to KubernetesPodOperator Oct 21, 2022
@sfc-gh-atatara
Copy link

it is actioned as single operator now

with _suppress(Exception):
self.patch_already_checked(remote_pod)
if pod_phase != PodPhase.SUCCEEDED:
if (not self.istio_enabled and pod_phase != PodPhase.SUCCEEDED) or (self.istio_enabled and pod_phase != PodPhase.SUCCEEDED and not self.pod_manager.container_is_succeeded(pod, 'base')):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will need to enable pre-commit locally see https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst

with _suppress(Exception):
self.patch_already_checked(remote_pod)
if pod_phase != PodPhase.SUCCEEDED:
if (not self.istio_enabled and pod_phase != PodPhase.SUCCEEDED) or (self.istio_enabled and pod_phase != PodPhase.SUCCEEDED and not self.pod_manager.container_is_succeeded(pod, 'base')):
Copy link
Contributor

@dstandish dstandish Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this expression can probably be simplified. i would try to do so. it's too hard to understand as it is.

Comment on lines +87 to +114
def container_is_completed(pod: V1Pod, container_name: str) -> bool:
"""
Examines V1Pod ``pod`` to determine whether ``container_name`` is completed.
If that container is present and completed, returns True. Returns False otherwise.
"""
container_statuses = pod.status.container_statuses if pod and pod.status else None
if not container_statuses:
return False
container_status = next(iter([x for x in container_statuses if x.name == container_name]), None)
if not container_status:
return False
return container_status.state.terminated is not None


def container_is_succeeded(pod: V1Pod, container_name: str) -> bool:
"""
Examines V1Pod ``pod`` to determine whether ``container_name`` is completed and succeeded.
If that container is present and completed and succeeded, returns True. Returns False otherwise.
"""
if not container_is_completed(pod, container_name):
return False
container_statuses = pod.status.container_statuses if pod and pod.status else None
if not container_statuses:
return False
container_status = next(iter([x for x in container_statuses if x.name == container_name]), None)
if not container_status:
return False
return container_status.state.terminated.exit_code == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems you are duplicating a lot of logic here. looks like you could pull out a get_container_status function

if remote_pod.status.phase in PodPhase.terminal_states:
break
if self.istio_enabled and remote_pod.status.phase == PodPhase.RUNNING and self.container_is_completed(
remote_pod, 'base'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'base' is defined in a constant somewhere

@eladkal
Copy link
Contributor

eladkal commented Nov 11, 2022

@joshuayeung are you still working on this PR?

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 27, 2022
@github-actions github-actions bot closed this Jan 1, 2023
@Owen-CH-Leung
Copy link
Contributor

Owen-CH-Leung commented May 16, 2023

any chances to reopen this @eladkal ? I know @joshuayeung personally and have his consent to continue to work on this PR. Thanks

@sfc-gh-atatara
Copy link

@Owen-CH-Leung I am actually interested in this work too. I have worked on this in my company and we are seeing stable results in development environment.

@dstandish dstandish reopened this May 16, 2023
@dstandish
Copy link
Contributor

i reopened it... but you might have to create a new PR anyway, unless joshua grants you permission to his fork

@eladkal
Copy link
Contributor

eladkal commented May 16, 2023

for anyone who wants to continue the work started on this PR, it's much better to cherry pick the commits to your own fork/branch and continue from there. By doing so you are working independently and also save the credit for the original author of the commits that you are building on top of.

@github-actions github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label May 17, 2023
@Owen-CH-Leung
Copy link
Contributor

Ok I'll cherry pick @joshuayeung 's commits and start a new PR from there
@sfc-gh-atatara Yeah Joshua and I are colleagues before and we actually implement this solution and have it running in our environment for over a year, so we can take care of this =)

@eladkal
Copy link
Contributor

eladkal commented May 18, 2023

Closing infavor of #31389

@eladkal eladkal closed this May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants