Databricks hook - retry on HTTP Status 429 as well #21852

alexott · 2022-02-27T16:54:43Z

If Databricks control plane receives too many API requests it starts to return HTTP code 429, and caller should retry that request. But Databricks hook retried only on 5xx status code.

closes: #21559

potiuk · 2022-02-27T17:36:51Z

Don't you think this should have exponential back-off on 429 (at the very least but also 500 IMHO). This is highly likely that you will only increase the problem with retrying with fixed retry_delay. We use tenacity for similar cases in other places - so maybe you should change it too @alexott to follow the same pattern ?

alexott · 2022-02-27T17:49:09Z

Yes, I thought about it as well, but I’ll need to think again. Let me mark this PR as draft

potiuk · 2022-02-27T17:58:45Z

Look for tenacity in google providers :)

potiuk · 2022-02-27T18:00:21Z

Or HTTP or SFTP. Same pattern was used there in a number of places.

uranusjr · 2022-03-01T00:00:49Z

If I am the API endpoint’s maintainer, I would ver much prefer if a client does not retry if I tell them 429, at least not before my specified Retry-After. 429 tells you to stop, not to try harder.

potiuk · 2022-03-01T09:23:00Z

If I am the API endpoint’s maintainer, I would ver much prefer if a client does not retry if I tell them 429, at least not before my specified Retry-After. 429 tells you to stop, not to try harder.

Exponential backoff we use for that is actually the best of both worlds. It does not stop, but it also decreases the pressure.

uranusjr · 2022-03-01T09:34:35Z

Exponential backoff still tries too early in most situations. A client receiving 429 is supposed to wait at least until the date specified in the Retry-After response header before any retries.

potiuk · 2022-03-01T09:42:20Z

Exponential backoff still tries too early in most situations. A client receiving 429 is supposed to wait at least until the date specified in the Retry-After response header before any retries.

Sure. Retry-After should be the source of first retry time - but exponential back-off after that does not hurt.

potiuk · 2022-03-01T10:10:23Z

Exponential backoff still tries too early in most situations. A client receiving 429 is supposed to wait at least until the date specified in the Retry-After response header before any retries.

Sure. Retry-After should be the source of first retry time - but exponential back-off after that does not hurt.

Just to add a bit more reasoning (I thought a bit about it).

The problem is that Retry-After is only a hint, not a "source of truth". It relies on the fact the server "knows" what it is doing. Which is not necessary valid:
a) it might be based on past information (which might be outdated and might not include the mounting spike traffic properly) - this happens often
b) it might be simply not there. often you will not get 429 but 5XX in similar situations because it's not only the server that gets flooded but also some gateways on the way or simply the server might timeout or run out of memory or other resources.

So IMHO it needs to be client to decide how to behave. One added value of exponential backoff is that it is still helpful in all retriable conditions that are "unknown" - i.e. 5XX. Those do not contain "Retry-After" to base your decision on.

So I think exponential back-off with some initial timeout (if Retry-After is available - it should be a starting point) should be the right approach. Additionally if even including exponential back-off, you get 429 with Retry-After where your exponential back-off next step is not long enough - the timeout should be re-adjusted to the "Retry-After" received. But if it is longer, then we should continue exponential back-off (because server information might be already out-dated and not include mounting traffic spike).

this fixes apache#21559

it's now uses exponential backoff by default

alexott · 2022-03-06T17:18:01Z

@potiuk I think that it's ready for review now...

potiuk

LGTM. @uranusjr ?

github-actions · 2022-03-08T13:25:22Z

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

boring-cyborg bot added the area:providers label Feb 27, 2022

alexott changed the title ~~Databricks hook - retry on HTTP Status 429 as well~~ [DRAFT] Databricks hook - retry on HTTP Status 429 as well Feb 27, 2022

Databricks hook - retry on HTTP Status 429 as well

65c47f5

this fixes apache#21559

alexott force-pushed the databricks-retry-on-429 branch from c73e3fe to b0fae79 Compare March 6, 2022 16:16

alexott requested a review from mik-laj as a code owner March 6, 2022 16:16

Reimplement retries using tenacity

8f7911b

it's now uses exponential backoff by default

alexott force-pushed the databricks-retry-on-429 branch from b0fae79 to 8f7911b Compare March 6, 2022 16:43

alexott changed the title ~~[DRAFT] Databricks hook - retry on HTTP Status 429 as well~~ Databricks hook - retry on HTTP Status 429 as well Mar 6, 2022

potiuk approved these changes Mar 8, 2022

View reviewed changes

github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Mar 8, 2022

potiuk merged commit 12e9e2c into apache:main Mar 13, 2022

potiuk mentioned this pull request Mar 15, 2022

Status of testing Providers that were prepared on March 15, 2022 #22264

Closed

41 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Databricks hook - retry on HTTP Status 429 as well #21852

Databricks hook - retry on HTTP Status 429 as well #21852

Uh oh!

alexott commented Feb 27, 2022

Uh oh!

potiuk commented Feb 27, 2022 •

edited

Loading

Uh oh!

alexott commented Feb 27, 2022

Uh oh!

potiuk commented Feb 27, 2022

Uh oh!

potiuk commented Feb 27, 2022

Uh oh!

uranusjr commented Mar 1, 2022 •

edited

Loading

Uh oh!

potiuk commented Mar 1, 2022

Uh oh!

uranusjr commented Mar 1, 2022

Uh oh!

potiuk commented Mar 1, 2022

Uh oh!

potiuk commented Mar 1, 2022 •

edited

Loading

Uh oh!

alexott commented Mar 6, 2022

Uh oh!

potiuk left a comment

Uh oh!

github-actions bot commented Mar 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Databricks hook - retry on HTTP Status 429 as well #21852

Databricks hook - retry on HTTP Status 429 as well #21852

Uh oh!

Conversation

alexott commented Feb 27, 2022

Uh oh!

potiuk commented Feb 27, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexott commented Feb 27, 2022

Uh oh!

potiuk commented Feb 27, 2022

Uh oh!

potiuk commented Feb 27, 2022

Uh oh!

uranusjr commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

potiuk commented Mar 1, 2022

Uh oh!

uranusjr commented Mar 1, 2022

Uh oh!

potiuk commented Mar 1, 2022

Uh oh!

potiuk commented Mar 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexott commented Mar 6, 2022

Uh oh!

potiuk left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

potiuk commented Feb 27, 2022 •

edited

Loading

uranusjr commented Mar 1, 2022 •

edited

Loading

potiuk commented Mar 1, 2022 •

edited

Loading