Looping issue using Hashicorp Vault

- Issue:
We encountered an issue that is mainly caused by overloading calls from Airflow to Vault while Vault was experiencing downtime. 

- Context:
A few days ago while Vault was down for 2 - 3 hours, all the worker pods in our Airflow cluster on K8s started retrying to pull credentials (URI) from Vault. This caused two things: the first one is that we found that airflow tried more than 500 calls per hour to call Vault and then lead to the second issue: our DevOps couldn't connect to Vault and figured out the root cause that brought down Vault. Thus Airflow became the target of blame...

- Question:
Is there any built-in retry strategy to prevent this kind of issue happens again? To generalize this issue, it can occur while using any secret manager, like AWS secrets manager, Google sm, or even RDS.  
If not and if you think it is worth being implemented, I would like to take this ticket to implement this kind of strategy to prevent this issue.  

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Looping issue using Hashicorp Vault #9713

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Looping issue using Hashicorp Vault #9713

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions