client: Increase the min connect timeout from 1s to 3s (otherwise connecting to distant Engines fails)#8328
Conversation
* The MinConnectTimeout was set to 1s which is good for a fast recovering from the situation that the engine is not ready to accept connections, while at the same time, it's too aggressive for a remote engine, eg. Dagger Engine on a Kubernetes [1], the engine may not complete the connect phase within the deadline and always failed in the connection checking loop [2]. Until the buildkit implements the better way of client creation [3], increase the timeout with a less aggressive 3s could mitigate this issue. [1] https://docs.dagger.io/integrations/kubernetes [2] https://github.com/dagger/dagger/blob/main/engine/client/buildkit.go#L45 [3] https://github.com/grpc/grpc-go/blob/master/Documentation/anti-patterns.md Signed-off-by: Neutron Soutmun <neutron.s@linecorp.com>
|
This was originally lowered from the default 20s in #7612 (cc @marcosnils). I'm a bit worried now if 3s is enough, but bringing it up too high negatively effects retry rates. Genuinely such a pain to manage this, see a previous case of trying to resolve this in buildkit: moby/buildkit#4200.
That's definitely something we could do (not right now, but sometime). |
|
Not being able to establish a connection within Before this change, we had:
After this change as it stands, the reconnect attempts will become:
More frequent initial attempts - i.e.
The above makes me wonder if we should set the MaxDelay to
Would the above work for your environment @neutronth? WDYT @jedevc @marcosnils? As a different approach, I quiet like the idea of picking a default of |
I’m interesting to tackle on this one if you all are ok. Currently @neutronth and I need to adjust in Dagger source and rebuild the cli manually. :( |
|
@gerhard my environment is a development one that Dagger CLI connects to remote engine hosted in remote cloud Kubernetes cluster (oversea network), more latency added, I guess. The Dagger CLI established the buildkit client over the kubepod [1] connection helper, it's been established properly. First of all, I was also wondering why it behaves like that as I saw the back-off configuration and it should be tackled properly. The spans have been splited with 1s and not looked like a back-off retrying.
Hence, this PR try to mitigate this issue by give the client more chance to succeed (in my case, 3s does) [1] https://github.com/moby/buildkit/blob/master/client/connhelper/kubepod/kubepod.go |
|
Yeah, the reason for this logic in buildkit is because we want a fast response for if the server is still in the starting state - the client should not back off, because it's a server side failure. This logic still applies for our use case. However, yeah, this isn't great. But I've never been able to work out a good compromise that works well both for local (low-latency, so this kind of waiting is alright) and remote (higher-latency, so backoff is needed).
All of those feel like better long-term solutions to me, but I'm potentially still alright with this as a short term solution. |
|
@neutronth that makes sense. Happy to go ahead with this as a short-term solution, and follow-up with a more robust implementation, as @jedevc mentioned. Getting this approved & merged so that it can be released in the upcoming Looking forward to your configurable follow-up @wingyplus 💪 |
|
Going to add a relese notes fragment before approving & merging. FTR: https://docs.dagger.io/contributing/#3-add-release-notes-fragment |
FTR: https://docs.dagger.io/contributing#3-add-release-notes-fragment Signed-off-by: Gerhard Lazu <gerhard@dagger.io>
|
Thank you @gerhard.
I will submit a PR in 2-3 days. 💪 |
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [dagger/dagger](https://redirect.github.com/dagger/dagger) | minor | `v0.12.7` -> `v0.13.0` | --- ### Release Notes <details> <summary>dagger/dagger (dagger/dagger)</summary> ### [`v0.13.0`](https://redirect.github.com/dagger/dagger/blob/HEAD/CHANGELOG.md#v0130---2024-09-11) [Compare Source](https://redirect.github.com/dagger/dagger/compare/v0.12.7...v0.13.0) ##### 🔥 Breaking Changes - Remove deprecated fields and arguments by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8065](https://redirect.github.com/dagger/dagger/pull/8065) - Remove `Container.withExec`'s `skipEntrypoint` argument - this is now the default (see `useEntrypoint`) - Remove `pipeline`, `Container.pipeline` and `Directory.pipeline` - Remove `GitModuleSource.cloneURL` (see `GitModuleSource.cloneRef`) ##### Added - Added new `Directory.digest` and `ModuleSource.digest` fields by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8282](https://redirect.github.com/dagger/dagger/pull/8282) \ These fields mirror the behavior of the `File.digest` field, computing a unique cryptographic digest over the contents of the object. - TUI: add `--no-exit`/`-E` so you can poke around after the call completes by [@​vito](https://redirect.github.com/vito) in [https://github.com/dagger/dagger/pull/8389](https://redirect.github.com/dagger/dagger/pull/8389) ##### Changed - The trace url is printed just before the final output to make it easy to find by [@​rajatjindal](https://redirect.github.com/rajatjindal) in [https://github.com/dagger/dagger/pull/8366](https://redirect.github.com/dagger/dagger/pull/8366) \ Also, the url will be printed only for a subset of dagger commands to reduce noise. - Increase the minimum connect timeout from 1s to 3s by [@​neutronth](https://redirect.github.com/neutronth) in [https://github.com/dagger/dagger/pull/8328](https://redirect.github.com/dagger/dagger/pull/8328) \ Connecting to a distant remote engine could otherwise fail if it was not reachable in 1s. ##### Fixed - Fixed void types from core incorrectly being seen as named scalars by [@​helderco](https://redirect.github.com/helderco) in [https://github.com/dagger/dagger/pull/8336](https://redirect.github.com/dagger/dagger/pull/8336) - Fix setting secrets on module object in constructor by [@​sipsma](https://redirect.github.com/sipsma) in [https://github.com/dagger/dagger/pull/8149](https://redirect.github.com/dagger/dagger/pull/8149) - Allow top-level field access with no constructor by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8331](https://redirect.github.com/dagger/dagger/pull/8331) \ Previously, if a field access was made immediately after the default constructor was called, then the access would fail. - Plain progress correctly displays carriage returns by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8347](https://redirect.github.com/dagger/dagger/pull/8347) \ Carriage returns could previously render weirdly in the output, displaying empty lines, and similar visual glitches. - cli: Fix default value on `Platform` flag by [@​helderco](https://redirect.github.com/helderco) in [https://github.com/dagger/dagger/pull/8360](https://redirect.github.com/dagger/dagger/pull/8360) ##### What to do next? - Read the [documentation](https://docs.dagger.io) - Join our [Discord server](https://discord.gg/dagger-io) - Follow us on [Twitter](https://twitter.com/dagger_io) </details> --- ### Configuration 📅 **Schedule**: Branch creation - "after 4pm on thursday" in timezone America/Los_Angeles, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/scottames/dots). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC43NC4xIiwidXBkYXRlZEluVmVyIjoiMzguNzQuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19--> --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: scottames-github-bot[bot] <162828115+scottames-github-bot[bot]@users.noreply.github.com>
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [dagger/dagger](https://redirect.github.com/dagger/dagger) | minor | `v0.12.7` -> `v0.13.0` | --- ### Release Notes <details> <summary>dagger/dagger (dagger/dagger)</summary> ### [`v0.13.0`](https://redirect.github.com/dagger/dagger/blob/HEAD/CHANGELOG.md#v0130---2024-09-11) [Compare Source](https://redirect.github.com/dagger/dagger/compare/v0.12.7...v0.13.0) ##### 🔥 Breaking Changes - Remove deprecated fields and arguments by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8065](https://redirect.github.com/dagger/dagger/pull/8065) - Remove `Container.withExec`'s `skipEntrypoint` argument - this is now the default (see `useEntrypoint`) - Remove `pipeline`, `Container.pipeline` and `Directory.pipeline` - Remove `GitModuleSource.cloneURL` (see `GitModuleSource.cloneRef`) ##### Added - Added new `Directory.digest` and `ModuleSource.digest` fields by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8282](https://redirect.github.com/dagger/dagger/pull/8282) \ These fields mirror the behavior of the `File.digest` field, computing a unique cryptographic digest over the contents of the object. - TUI: add `--no-exit`/`-E` so you can poke around after the call completes by [@​vito](https://redirect.github.com/vito) in [https://github.com/dagger/dagger/pull/8389](https://redirect.github.com/dagger/dagger/pull/8389) ##### Changed - The trace url is printed just before the final output to make it easy to find by [@​rajatjindal](https://redirect.github.com/rajatjindal) in [https://github.com/dagger/dagger/pull/8366](https://redirect.github.com/dagger/dagger/pull/8366) \ Also, the url will be printed only for a subset of dagger commands to reduce noise. - Increase the minimum connect timeout from 1s to 3s by [@​neutronth](https://redirect.github.com/neutronth) in [https://github.com/dagger/dagger/pull/8328](https://redirect.github.com/dagger/dagger/pull/8328) \ Connecting to a distant remote engine could otherwise fail if it was not reachable in 1s. ##### Fixed - Fixed void types from core incorrectly being seen as named scalars by [@​helderco](https://redirect.github.com/helderco) in [https://github.com/dagger/dagger/pull/8336](https://redirect.github.com/dagger/dagger/pull/8336) - Fix setting secrets on module object in constructor by [@​sipsma](https://redirect.github.com/sipsma) in [https://github.com/dagger/dagger/pull/8149](https://redirect.github.com/dagger/dagger/pull/8149) - Allow top-level field access with no constructor by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8331](https://redirect.github.com/dagger/dagger/pull/8331) \ Previously, if a field access was made immediately after the default constructor was called, then the access would fail. - Plain progress correctly displays carriage returns by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8347](https://redirect.github.com/dagger/dagger/pull/8347) \ Carriage returns could previously render weirdly in the output, displaying empty lines, and similar visual glitches. - cli: Fix default value on `Platform` flag by [@​helderco](https://redirect.github.com/helderco) in [https://github.com/dagger/dagger/pull/8360](https://redirect.github.com/dagger/dagger/pull/8360) ##### What to do next? - Read the [documentation](https://docs.dagger.io) - Join our [Discord server](https://discord.gg/dagger-io) - Follow us on [Twitter](https://twitter.com/dagger_io) </details> --- ### Configuration 📅 **Schedule**: Branch creation - "after 4pm on thursday" in timezone America/Los_Angeles, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/scottames/containers). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC43NC4xIiwidXBkYXRlZEluVmVyIjoiMzguNzQuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19--> --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: scottames-github-bot[bot] <162828115+scottames-github-bot[bot]@users.noreply.github.com>
This PR contains the following updates: | Package | Update | Change | |---|---|---| | [dagger/dagger](https://redirect.github.com/dagger/dagger) | minor | `v0.12.7` -> `v0.13.0` | --- ### Release Notes <details> <summary>dagger/dagger (dagger/dagger)</summary> ### [`v0.13.0`](https://redirect.github.com/dagger/dagger/blob/HEAD/CHANGELOG.md#v0130---2024-09-11) [Compare Source](https://redirect.github.com/dagger/dagger/compare/v0.12.7...v0.13.0) ##### 🔥 Breaking Changes - Remove deprecated fields and arguments by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8065](https://redirect.github.com/dagger/dagger/pull/8065) - Remove `Container.withExec`'s `skipEntrypoint` argument - this is now the default (see `useEntrypoint`) - Remove `pipeline`, `Container.pipeline` and `Directory.pipeline` - Remove `GitModuleSource.cloneURL` (see `GitModuleSource.cloneRef`) ##### Added - Added new `Directory.digest` and `ModuleSource.digest` fields by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8282](https://redirect.github.com/dagger/dagger/pull/8282) \ These fields mirror the behavior of the `File.digest` field, computing a unique cryptographic digest over the contents of the object. - TUI: add `--no-exit`/`-E` so you can poke around after the call completes by [@​vito](https://redirect.github.com/vito) in [https://github.com/dagger/dagger/pull/8389](https://redirect.github.com/dagger/dagger/pull/8389) ##### Changed - The trace url is printed just before the final output to make it easy to find by [@​rajatjindal](https://redirect.github.com/rajatjindal) in [https://github.com/dagger/dagger/pull/8366](https://redirect.github.com/dagger/dagger/pull/8366) \ Also, the url will be printed only for a subset of dagger commands to reduce noise. - Increase the minimum connect timeout from 1s to 3s by [@​neutronth](https://redirect.github.com/neutronth) in [https://github.com/dagger/dagger/pull/8328](https://redirect.github.com/dagger/dagger/pull/8328) \ Connecting to a distant remote engine could otherwise fail if it was not reachable in 1s. ##### Fixed - Fixed void types from core incorrectly being seen as named scalars by [@​helderco](https://redirect.github.com/helderco) in [https://github.com/dagger/dagger/pull/8336](https://redirect.github.com/dagger/dagger/pull/8336) - Fix setting secrets on module object in constructor by [@​sipsma](https://redirect.github.com/sipsma) in [https://github.com/dagger/dagger/pull/8149](https://redirect.github.com/dagger/dagger/pull/8149) - Allow top-level field access with no constructor by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8331](https://redirect.github.com/dagger/dagger/pull/8331) \ Previously, if a field access was made immediately after the default constructor was called, then the access would fail. - Plain progress correctly displays carriage returns by [@​jedevc](https://redirect.github.com/jedevc) in [https://github.com/dagger/dagger/pull/8347](https://redirect.github.com/dagger/dagger/pull/8347) \ Carriage returns could previously render weirdly in the output, displaying empty lines, and similar visual glitches. - cli: Fix default value on `Platform` flag by [@​helderco](https://redirect.github.com/helderco) in [https://github.com/dagger/dagger/pull/8360](https://redirect.github.com/dagger/dagger/pull/8360) ##### What to do next? - Read the [documentation](https://docs.dagger.io) - Join our [Discord server](https://discord.gg/dagger-io) - Follow us on [Twitter](https://twitter.com/dagger_io) </details> --- ### Configuration 📅 **Schedule**: Branch creation - "after 4pm on thursday" in timezone America/Los_Angeles, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/scottames/daggerverse). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC43NC4xIiwidXBkYXRlZEluVmVyIjoiMzguNzQuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIl19--> --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: scottames-github-bot[bot] <162828115+scottames-github-bot[bot]@users.noreply.github.com> Co-authored-by: Scott Ames <scott@ames.sh>
The MinConnectTimeout was set to 1s which is good for a fast recovering from the situation that the engine is not ready to accept connections, while at the same time, it's too aggressive for a remote engine, eg. Dagger Engine on a Kubernetes [1], the engine may not complete the connect phase within the deadline and always failed in the connection checking loop [2].
Until the buildkit implements the better way of client creation [3], increase the timeout with a less aggressive 3s could mitigate this issue.
[1] https://docs.dagger.io/integrations/kubernetes
[2] https://github.com/dagger/dagger/blob/main/engine/client/buildkit.go#L45
[3] https://github.com/grpc/grpc-go/blob/master/Documentation/anti-patterns.md