Skip to content

Conversation

@pawalat
Copy link

@pawalat pawalat commented Apr 13, 2023

This pull request introduces a retry mechanism to handle Vault connection failures more gracefully. By employing the urllib3.util.Retry class, we can configure the retry behavior for specific HTTP status codes that are more likely to be transient, such as 412 (Precondition Failed), 500 (Internal Server Error), 502 (Bad Gateway), and 503 (Service Unavailable).

These changes are implemented within the Vault client file, using the hvac library, to enhance the robustness and reliability of the connection with Vault. The hvac library's Client class allows for the injection of a custom Session object, which we can configure with an HTTPAdapter instance to apply our desired retry behavior.

As mentioned in the hvac documentation, thoughtful retrying of failed requests is crucial for a seamless experience with Vault, especially in the context of eventual consistency. Vault may return a 412 status code when data is not yet available on the node where the request was made. In addition, retrying 5xx status codes is generally advisable.

With these changes, Airflow's integration with Vault becomes more resilient and fault-tolerant, improving the overall user experience and stability of the system.

@boring-cyborg
Copy link

boring-cyborg bot commented Apr 13, 2023

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@potiuk
Copy link
Member

potiuk commented Apr 22, 2023

There are static check failures that needs to be fixed.

@notatallshaw-gts
Copy link
Contributor

@pawalat Do you need any assistance fixing this PR or are you good?

We experienced an outage today and came to the same conclusion that HashiCorp Provider should retry the connection based on the HVAC documentation.

@notatallshaw-gts
Copy link
Contributor

@potiuk I would like to get this change landed in Airflow as it caused a small outage for our team so we're going to patch it locally anyway.

What's the etiquette here on opening my own PR that does the same thing? Is there a minimum amount of time I should wait before considering this PR stale?

@potiuk
Copy link
Member

potiuk commented May 1, 2023

What's the etiquette here on opening my own PR that does the same thing? Is there a minimum amount of time I should wait before considering this PR stale?

Use your own judgement. We do not have procedures/ettiquete for everything.

And it's ok to have things in parallell :) , there isi somehow a fear that having two parallell things is somewhat bad. It's not. You learn, you iterate, you can solve problems and the original author might take a look and make their own changes. Everyone learns. The code has already been effectively offered for the community to take.

But generally the usual time is 72Hrs if we expect the person to see it and have a chance to respond - which accounts for time zone/weekends and the like.

@notatallshaw-gts
Copy link
Contributor

Alternative PR is here with passing tests and additional test: #31073

I think this PR can be closed given there has been no activity?

@potiuk potiuk closed this Jun 14, 2023
@potiuk
Copy link
Member

potiuk commented Jun 14, 2023

sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants