Skip to content

Conversation

@float34
Copy link

@float34 float34 commented Dec 2, 2018

Make sure you have checked all steps below.

Jira

  • My PR addresses the following Airflow Jira issues and references them in the PR title. For example, "[AIRFLOW-XXX] My Airflow PR"

Description

  • Here are some details about my PR, including screenshots of any UI changes:
    • We need to pass 'try_number' parameter during the pod creation in kubernetes_executor.py, otherwise '_labels_to_key' function will always raise KeyError, making self.result_queue empty.
    • As a result '_change_state' function in kubernetes_executor.py will never be called and pods will keep hanging after termination even if other pod deletion conditions are met.

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    • Not sure how to write one for this particular case, maybe someone will help?

Commits

  • My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added.

Code Quality

  • Passes flake8

execution_date=self._datetime_to_label_safe_datestring(execution_date),
airflow_command=command, kube_executor_config=kube_executor_config
airflow_command=command, kube_executor_config=kube_executor_config,
try_number=str(try_number)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, keep trailing comma.

'task_id': task_id,
'execution_date': execution_date
'execution_date': execution_date,
'try_number': try_number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@tiagovrtr
Copy link

Any idea on what could have stalled test_volume_mount? I am getting flooded with stale pods because of this issue

@dmateusp
Copy link
Contributor

I deployed v-1-10-stable in my Kubernetes cluster and reproduced the same issue, this PR solved it on my side as well

@kyle-hamlin
Copy link
Contributor

kyle-hamlin commented Dec 18, 2018

Can we get this merged? This is pretty critical for Airflow to work properly in Kubernetes.

@singhania
Copy link

I am facing one issue where KubernetesExecutor is not launching pods as soon as reaches parallelism even though pods are completed.

@dmateusp
Copy link
Contributor

dmateusp commented Jan 8, 2019

I think we're observing the same issue @singhania I'll try to reproduce tomorrow morning

@singhania
Copy link

singhania commented Jan 9, 2019

One more observation, If i don't set parallelism then also after some 30 completed pods it stops creating new pods.
I have opened an issue on Airflow jira board but no idea about the priority.
https://issues.apache.org/jira/browse/AIRFLOW-3652

After debugging found that key is not getting created properly and hence running task are not getting popped after state change.

@ckljohn
Copy link
Contributor

ckljohn commented Jan 10, 2019

I found a similar PR #4163. Are they fixing the same thing?

@feng-tao
Copy link
Member

PTAL @dimberman

@singhania
Copy link

singhania commented Jan 11, 2019

I have created a PR #4471 which along with this PR will solve the issue of pods not being created (https://issues.apache.org/jira/browse/AIRFLOW-3652).
Would really like to get it reviewed as we are blocked by this issue.

@dmateusp
Copy link
Contributor

dmateusp commented Feb 1, 2019

hi there, wanted to let you know I've cherry picked this PR and I have been running it in prod for 2 weeks in here https://github.com/dmateusp/incubator-airflow/tree/v1-10-stable-AIRFLOW-3022-AIRFLOW-3412-AIRFLOW-3652

@spinus
Copy link

spinus commented Apr 5, 2019

Looks like another PR was merged with similar function (https://github.com/apache/airflow/pull/4163/files), probably that can be closed, correct?

@float34
Copy link
Author

float34 commented Apr 14, 2019

@spinus Yes, I'll close that now as irrelevant.

@float34 float34 closed this Apr 14, 2019
@float34 float34 deleted the fix/fix-pod-autodeletion branch April 14, 2019 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants