Skip to content

Conversation

@aagateuip
Copy link
Contributor

@aagateuip aagateuip commented Jun 24, 2023


  • Added 5 retries to extract_xcom to guard against intermittent network connectivity failures.
  • xcom json is validated to make sure entire json was retrieved. If json validation fails extract_xcom keeps on retrying.
  • Finally xcom sidecar container is killed after either extract_xcom retry limit is reached or if valid json could be retrieved.
  • Also added 5 retries to extract_xcom_kill method as well to guard against intermittent network connectivity failures.

closes: #32111

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@aagateuip aagateuip requested a review from jedcunningham as a code owner June 24, 2023 00:52
@boring-cyborg boring-cyborg bot added provider:cncf-kubernetes Kubernetes (k8s) provider related issues area:providers labels Jun 24, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 24, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@aagateuip aagateuip marked this pull request as draft June 24, 2023 00:55
Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you follow the steps explained in this doc to activate the pre-commit hooks and fix the static checks?
Then you need to add a test for your change.

@aagateuip aagateuip force-pushed the fix/32111_extract_xcom branch 2 times, most recently from db2f6b4 to 7654344 Compare June 25, 2023 17:43
@aagateuip
Copy link
Contributor Author

Could you follow the steps explained in this doc to activate the pre-commit hooks and fix the static checks?
Then you need to add a test for your change.

Thanks will do! Worked on the fixing static checks. Will work on adding tests for the change

@aagateuip aagateuip force-pushed the fix/32111_extract_xcom branch 2 times, most recently from 63d8247 to ca83211 Compare June 26, 2023 06:31
@aagateuip aagateuip marked this pull request as ready for review June 26, 2023 06:46
@aagateuip aagateuip force-pushed the fix/32111_extract_xcom branch 6 times, most recently from 8ab201c to aec892d Compare June 27, 2023 15:53
@aagateuip
Copy link
Contributor Author

@jedcunningham can you please review this PR when you have time ? thx!

@aagateuip aagateuip force-pushed the fix/32111_extract_xcom branch 2 times, most recently from 389e48b to 6981ab2 Compare June 28, 2023 03:48
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, mainly related to code style/exception handling

- Added retries to extract_xcom to guard against intermittent
network connectivity failures.
- xcom json is validated to make sure entire json was retrieved.
- xcom sidecar is killed only if xcom json that was retrieved was valid.
@aagateuip aagateuip force-pushed the fix/32111_extract_xcom branch from 05d6324 to 3f797bc Compare June 29, 2023 14:07
@potiuk
Copy link
Member

potiuk commented Jun 29, 2023

Isn't the description / intenti different now? In the description you write about killing the xcom container only on failure, but seems it is killed always ?

@aagateuip
Copy link
Contributor Author

Isn't the description / intenti different now? In the description you write about killing the xcom container only on failure, but seems it is killed always ?

Yes let me fix the description thank you!

@aagateuip
Copy link
Contributor Author

Done fixed the description

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk
Copy link
Member

potiuk commented Jun 29, 2023

@jens-scheffler-bosch ?

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a bit of time re-reading the code and first wanted to complain - because the json.loads() looked use-less first hand for me.
PR looks good but still I'd prefer to keep a small additional comment for somebody in future w/o PR context to better understand the logic.

@aagateuip
Copy link
Contributor Author

Thank you @jens-scheffler-bosch added a comment as per your suggestion

Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@potiuk
Copy link
Member

potiuk commented Jun 30, 2023

Nice. But now line is too long :).

@aagateuip
Copy link
Contributor Author

Fixed long line now ..thx! :-)

@eladkal eladkal merged commit df4c883 into apache:main Jul 1, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 1, 2023

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

@ruztomas
Copy link

ruztomas commented Feb 5, 2025

Hello!

I am facing this problem. After the fix was merged, did you @aagateuip experience it again?

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KubernetesPodOperator job intermittently fails - unable to retrieve json from xcom sidecar container due to network connectivity issues

6 participants