Retry initializing TTY size a bit more #3573

rumpl · 2022-04-28T13:02:21Z

- What I did

I some cases, for example if there is a heavy load, the initialization of the TTY size would fail. This change makes the cli retry more times, 10 instead of 5 and we wait incrementally from 10ms to 100ms between two calls to resize the TTY.

Running 150 containers (see script below) takes ~2 minutes to complete without this change, with this change the time to complete was ~1m55s and 2m12s so I don't think this change will impact users.

Running one container docker run --rm -t hello-world takes the same amount of time (1.5s on my machine) with or without this change.

Relates to #3554

- How I did it
Changed the retry logic to retry 10 times instead of 5 and increase the wait between retries to 100ms

- How to verify it

Running this script shouldn't show any failed to resize tty, using default size errors.

#!/bin/bash

for A in $(seq 1 10); do
    echo $A

    for a in $(seq 1 15); do
        com.docker.cli run --rm -t alpine echo $A/$a &
    done

    wait
    date
done

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

codecov-commenter · 2022-04-28T13:10:46Z

Codecov Report

Merging #3573 (9598c4c) into master (fd2bc1f) will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3573      +/-   ##
==========================================
+ Coverage   59.36%   59.47%   +0.10%     
==========================================
  Files         285      287       +2     
  Lines       24097    24174      +77     
==========================================
+ Hits        14306    14377      +71     
- Misses       8930     8931       +1     
- Partials      861      866       +5

thaJeztah · 2022-04-30T09:48:19Z

Let me comment here, as we were discussing this;

First of all, this loop is for the initial resize (docker run -> resize to desired dimensions), and not used for the "monitor for terminal to be resized" case
The retry loop is only hit after we tried a resize immediately (line 54 does a resize immediately after attaching, and we only hit the retry loop if that fails)
so: incrementing this sleep does not delay the initial resize (which is good)

Thinking about this a bit more;

for a "daemon under load", it may be expected that we need slightly longer (or more retries)
if the resize is taking longer for other (daemon not under load) situations, it may be that there's a regression in performance
so even if we fix the initial size (to be defined when creating the container, and without a "resize" after), the "monitor for resize" would still need to trigger resizes
so: if there's a regression in performance, that "monitor" case would still be affected by the regression (if any), so should still be looked into. (at least, we can keep a tracking ticket to do so)

thaJeztah · 2022-04-30T09:53:05Z

cli/command/container/tty.go

-			for retry := 0; retry < 5; retry++ {
-				time.Sleep(10 * time.Millisecond)
+			for retry := 0; retry < 10; retry++ {
+				time.Sleep(100 * time.Millisecond)


Wondering if we should consider making the delay incremental, e.g.;

time.Sleep((retry + 1) * 10 * time.Millisecond)

retry after 10 milliseconds ("happy path" for when the daemon wasn't able to handle it immediately)

if failed; add 10 milliseconds (so, retry after 20 milliseconds)

if failed; add 10 milliseconds (so, retry after 30)

and so on, until we reach 10 attempts (which would be using 100 milliseconds)

If incrementing by 10 is not "aggressive" enough, perhaps we could make it exponential (but with a "cap", which could be 100ms);

10 -> 20 -> 40 -> 80 -> 100 -> 100 -> 100

Doing so would allow the resize to be handled "faster" (in the happy case), while still allowing it to take longer (if the daemon happens to be under load)

WDYT?

thaJeztah

LGTM, thanks!

thaJeztah · 2022-05-02T12:15:24Z

Hmm.... CI fails on the related test, so wondering if incrementing with 10 is not aggressive enough after all 😞

--- FAIL: TestInitTtySizeErrors (0.65s)
    tty_test.go:29: assertion failed: 
        --- expectedError
        +++ →
        @@ -1,2 +1 @@
        -failed to resize tty, using default size

I some cases, for example if there is a heavy load, the initialization of the TTY size would fail. This change makes the cli retry 10 times instead of 5 and we wait incrementally from 10ms to 100ms Relates to docker#3554 Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>

rumpl · 2022-05-02T12:28:57Z

It's the other way around, the test is about whether we return that error if we can't resize, to the trick is to sleep enough time in the test for the loop to finish and the function to return and error. It should be good now

thaJeztah · 2022-05-02T12:57:42Z

Ah, whoop!

Let's get this one merged 👍

rumpl force-pushed the fix-tty-init branch from 100eeda to 891a676 Compare April 28, 2022 13:08

rumpl force-pushed the fix-tty-init branch from 891a676 to ad96c09 Compare April 28, 2022 13:23

rumpl requested review from ndeloof and thaJeztah April 28, 2022 13:45

rumpl self-assigned this Apr 28, 2022

thaJeztah added the status/2-code-review label Apr 28, 2022

thaJeztah added this to the 22.04.0 milestone Apr 28, 2022

thaJeztah mentioned this pull request Apr 30, 2022

Add bash completion for docker stack config #3577

Merged

thaJeztah reviewed Apr 30, 2022

View reviewed changes

thaJeztah mentioned this pull request Apr 30, 2022

investigate terminal resize issues, and (if possible) set size on creation #3554

Closed

rumpl force-pushed the fix-tty-init branch 2 times, most recently from c63bc19 to b6ed967 Compare May 2, 2022 12:06

thaJeztah approved these changes May 2, 2022

View reviewed changes

rumpl force-pushed the fix-tty-init branch from b6ed967 to 9598c4c Compare May 2, 2022 12:23

thaJeztah merged commit c76cbdc into docker:master May 2, 2022

thaJeztah mentioned this pull request May 18, 2022

Flaky test: TestInitTtySizeErrors() (on macOS "latest") #3441

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retry initializing TTY size a bit more #3573

Retry initializing TTY size a bit more #3573

Uh oh!

rumpl commented Apr 28, 2022 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 28, 2022 •

edited

Loading

Uh oh!

thaJeztah commented Apr 30, 2022

Uh oh!

thaJeztah Apr 30, 2022 •

edited

Loading

Uh oh!

rumpl May 2, 2022

Uh oh!

thaJeztah left a comment

Uh oh!

thaJeztah commented May 2, 2022

Uh oh!

rumpl commented May 2, 2022 •

edited

Loading

Uh oh!

thaJeztah commented May 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Retry initializing TTY size a bit more #3573

Retry initializing TTY size a bit more #3573

Uh oh!

Conversation

rumpl commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

thaJeztah commented Apr 30, 2022

Uh oh!

thaJeztah Apr 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rumpl May 2, 2022

Choose a reason for hiding this comment

Uh oh!

thaJeztah left a comment

Choose a reason for hiding this comment

Uh oh!

thaJeztah commented May 2, 2022

Uh oh!

rumpl commented May 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thaJeztah commented May 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rumpl commented Apr 28, 2022 •

edited

Loading

codecov-commenter commented Apr 28, 2022 •

edited

Loading

thaJeztah Apr 30, 2022 •

edited

Loading

rumpl commented May 2, 2022 •

edited

Loading