Increase default for connect timeout #4228

fjetter · 2020-11-09T15:10:12Z

Disclaimer: I don't have data to back this up, only a gut feeling.

We're still seeing connection timeouts (#4080), even after the removal of the fixed handshake timeouts in #4176. There we already discussed the increase of the default timeout since the connect no longer just needs to accommodate for the actual TCP (or whatever) connect but also for the handshake.

What's made this worse is that due to DNS instabilities it was requested to split up to favour multiple small attempts instead of one large attempt (#3104)

Putting everything together, I believe the initial default of 10s should be increased to counteract the added complexity of this operation. I don't have a good feeling about an appropriate value here, in #4176 there was a suggestion to increase it to 30s but I could also see higher values to make sense.

cc @jcrist

jcrist · 2020-11-09T15:13:20Z

Increasing the default to 30s makes sense to me. Would like to get a 👍 from another maintainer before merging though.

fjetter · 2020-11-09T15:38:23Z

I just checked, we've been running with a 60s default in our infrastructure at BY for about a year now. I think we set it up this way back then since we were debugging some weird dead comm failures and we never changed it back.

Anyhow,... just wanted to put a real world data point in here. timeouts are sometimes a sensitive, infra depending issue and 60s may be good for us but too much for other. Either way, I believe 10s is too small :)

quasiben · 2020-11-09T18:13:56Z

On large benchmark runs we have set the timeout somewhat arbitrarily to 100s. Increasing to 30 or even 60 would be fine with me.

iyawnis · 2020-11-12T18:55:52Z

I am not sure if my issue is related to this, I am seeing the following error about 2 -3 seconds after attempting to connect:
OSError: Timed out during handshake while connecting to tcp://scheduler:7777 after 10 s. Passing a timeout kwarg to Client will change the error string, but error still comes in 2-3 seconds after client instantiation. Distributred 2.30.1 dask 2.30

iyawnis · 2020-11-12T19:02:19Z

Just confirmed that I can connect if I change distribured version to 2.30.0 just for Client (without changing the scheduler).

fjetter · 2020-11-18T15:39:58Z

@latusaki I think your report is unconnected to the actual value of the timeout which is discussed in here. Can you open another ticket for this? At the very least the exception message might be misleading since it's probably something other than the timeout then. In particular the entire traceback would be interesting for this since the exception cause should usually also be logged which should reveal the actual exception

crusaderky · 2021-07-13T14:42:27Z

Superseded by #5022

fjetter · 2021-07-14T08:17:34Z

We bumped to 30s in #5022. Closing this

Increase default for connect timeout

8693c5f

fjetter mentioned this pull request Nov 18, 2020

Timed out trying to connect ... : connect() didn't finish in time #4080

Open

fjetter mentioned this pull request Dec 22, 2020

Identify lack of scalability in gwas_linear_regression sgkit-dev/sgkit#390

Open

Base automatically changed from master to main March 8, 2021 19:04

fjetter mentioned this pull request Jul 13, 2021

Improve CI stability #5022

Merged

crusaderky added a commit to crusaderky/distributed that referenced this pull request Jul 13, 2021

Bump timeouts from 10/20s to 30s (see dask#4228)

b088098

fjetter closed this Jul 14, 2021

fjetter deleted the increase_connect_timeout_default branch July 14, 2021 08:17

Hedingber mentioned this pull request Oct 20, 2021

[Requirements] Bump versions to conform to safety checks mlrun/mlrun#1439

Merged

fjetter mentioned this pull request Feb 9, 2022

Increase robustness to TimeoutError during connect #5096

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase default for connect timeout #4228

Increase default for connect timeout #4228

Uh oh!

fjetter commented Nov 9, 2020 •

edited

Loading

Uh oh!

jcrist commented Nov 9, 2020

Uh oh!

fjetter commented Nov 9, 2020

Uh oh!

quasiben commented Nov 9, 2020

Uh oh!

iyawnis commented Nov 12, 2020

Uh oh!

iyawnis commented Nov 12, 2020

Uh oh!

fjetter commented Nov 18, 2020

Uh oh!

crusaderky commented Jul 13, 2021

Uh oh!

fjetter commented Jul 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Increase default for connect timeout #4228

Increase default for connect timeout #4228

Uh oh!

Conversation

fjetter commented Nov 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcrist commented Nov 9, 2020

Uh oh!

fjetter commented Nov 9, 2020

Uh oh!

quasiben commented Nov 9, 2020

Uh oh!

iyawnis commented Nov 12, 2020

Uh oh!

iyawnis commented Nov 12, 2020

Uh oh!

fjetter commented Nov 18, 2020

Uh oh!

crusaderky commented Jul 13, 2021

Uh oh!

fjetter commented Jul 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fjetter commented Nov 9, 2020 •

edited

Loading