Skip to content

Conversation

@jscheffl
Copy link
Contributor

@jscheffl jscheffl commented Dec 1, 2024

In #44311 (comment) @kaxil, @potiuk and me had a bit of discussion. As promised to come back with this, this PR implements (as promised) a way to make the retries in Edge worker configurable.

But it is also opening the box of debates because:

  • Do we want to add a new config? (Some people start screaming?)
    • (My position: Sensible default and make it configurable)
  • Should we really retry 10 times?
    • (My position: 10 attempts was the former default in internal API, in a small prod outage I can say at least this is good such that tasks do not fail in small webserver outages or connection interrupts. We see daily flakiness on our WAN. As Zombie threshold is at 300s per default retrying more than 5min might not be needed. But also we should faster on small glitches... so the exponential back-off is good. I tested with the parameters below and I think for waiting 5min max it is reasonable to test 10 times in between before fail.
  • Oh, why specific in Edge? (I saw multiple occasions in retries in different places in the repo. But also moving to TaskSDK I think we also should consider making this more common - at least Edge API retries from far away should be matching to the calls that are made to TaskSDK!)
    • (My position: I'd favor to make such setting common in Edge API and at least TaskSDK calls)

@ashb would also call for your opinion.

And if needed - but I assume it can be made within this PR - we could also call for an open [DISCUSS] or loop-in other stakeholders. Le me know.

UPDATE 28.12.2024: Updated after merge of #45121

@jscheffl jscheffl added AIP-69 Edge Executor area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK provider:edge Edge Executor / Worker (AIP-69) / edge3 labels Dec 1, 2024
@jscheffl jscheffl force-pushed the feature/make-edge-api-calls-configurable branch from 1d71d66 to 359c75c Compare December 7, 2024 16:21
@jscheffl jscheffl requested review from ashb, Copilot, kaxil and potiuk December 7, 2024 20:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 3 out of 4 changed files in this pull request and generated no suggestions.

Files not reviewed (1)
  • providers/src/airflow/providers/edge/CHANGELOG.rst: Language not supported

@jscheffl jscheffl force-pushed the feature/make-edge-api-calls-configurable branch 2 times, most recently from 99d5353 to 0c2bbf3 Compare December 14, 2024 23:03
@jscheffl
Copy link
Contributor Author

Note: I'd make the implementation consistent to PR #45121 which adds the same to TaskSDK... once the review and merge of the other PR has been made.

@jscheffl jscheffl force-pushed the feature/make-edge-api-calls-configurable branch from 0c2bbf3 to 79e1149 Compare December 28, 2024 14:38
@jscheffl jscheffl requested a review from amoghrajesh December 28, 2024 14:40
@jscheffl jscheffl force-pushed the feature/make-edge-api-calls-configurable branch from 79e1149 to 74a839a Compare December 29, 2024 22:21
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1 with respect to changes in this PR.

On related note, I see that there are no tests present for the api_client at all, @jscheffl. Is that intentional or was it missed out?

@jscheffl jscheffl force-pushed the feature/make-edge-api-calls-configurable branch from 74a839a to e8df5c4 Compare January 4, 2025 13:09
@jscheffl
Copy link
Contributor Author

jscheffl commented Jan 4, 2025

LGTM +1 with respect to changes in this PR.

On related note, I see that there are no tests present for the api_client at all, @jscheffl. Is that intentional or was it missed out?

Thanks for hinting me to missing pytests... added them and by this also found an error... so good to have tests now.

@jscheffl jscheffl force-pushed the feature/make-edge-api-calls-configurable branch from 80b53cb to e302bb4 Compare January 4, 2025 14:50
@jscheffl jscheffl merged commit 1e04741 into apache:main Jan 4, 2025
156 checks passed
HariGS-DB pushed a commit to HariGS-DB/airflow that referenced this pull request Jan 16, 2025
* Make Edge API retries configurable

* Align implementation with TaskSDK PR apache#45121

* Add pytests and fix retry handling

* Fix mypy on pytests

* Update changelog

* Update changelog
got686-yandex pushed a commit to got686-yandex/airflow that referenced this pull request Jan 30, 2025
* Make Edge API retries configurable

* Align implementation with TaskSDK PR apache#45121

* Add pytests and fix retry handling

* Fix mypy on pytests

* Update changelog

* Update changelog
@jscheffl jscheffl deleted the feature/make-edge-api-calls-configurable branch October 5, 2025 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AIP-69 Edge Executor area:providers area:task-execution-interface-aip72 AIP-72: Task Execution Interface (TEI) aka Task SDK provider:edge Edge Executor / Worker (AIP-69) / edge3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants