Skip to content

Conversation

@rumpl
Copy link
Member

@rumpl rumpl commented Apr 28, 2022

- What I did

I some cases, for example if there is a heavy load, the initialization of the TTY size would fail. This change makes the cli retry more times, 10 instead of 5 and we wait incrementally from 10ms to 100ms between two calls to resize the TTY.

Running 150 containers (see script below) takes ~2 minutes to complete without this change, with this change the time to complete was ~1m55s and 2m12s so I don't think this change will impact users.

Running one container docker run --rm -t hello-world takes the same amount of time (1.5s on my machine) with or without this change.

Relates to #3554

- How I did it
Changed the retry logic to retry 10 times instead of 5 and increase the wait between retries to 100ms

- How to verify it

Running this script shouldn't show any failed to resize tty, using default size errors.

#!/bin/bash

for A in $(seq 1 10); do
    echo $A

    for a in $(seq 1 15); do
        com.docker.cli run --rm -t alpine echo $A/$a &
    done

    wait
    date
done

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@codecov-commenter
Copy link

codecov-commenter commented Apr 28, 2022

Codecov Report

Merging #3573 (9598c4c) into master (fd2bc1f) will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3573      +/-   ##
==========================================
+ Coverage   59.36%   59.47%   +0.10%     
==========================================
  Files         285      287       +2     
  Lines       24097    24174      +77     
==========================================
+ Hits        14306    14377      +71     
- Misses       8930     8931       +1     
- Partials      861      866       +5     

@thaJeztah
Copy link
Member

Let me comment here, as we were discussing this;

  • First of all, this loop is for the initial resize (docker run -> resize to desired dimensions), and not used for the "monitor for terminal to be resized" case
  • The retry loop is only hit after we tried a resize immediately (line 54 does a resize immediately after attaching, and we only hit the retry loop if that fails)
  • so: incrementing this sleep does not delay the initial resize (which is good)

Thinking about this a bit more;

  1. for a "daemon under load", it may be expected that we need slightly longer (or more retries)
  2. if the resize is taking longer for other (daemon not under load) situations, it may be that there's a regression in performance
  3. so even if we fix the initial size (to be defined when creating the container, and without a "resize" after), the "monitor for resize" would still need to trigger resizes
  4. so: if there's a regression in performance, that "monitor" case would still be affected by the regression (if any), so should still be looked into. (at least, we can keep a tracking ticket to do so)

for retry := 0; retry < 5; retry++ {
time.Sleep(10 * time.Millisecond)
for retry := 0; retry < 10; retry++ {
time.Sleep(100 * time.Millisecond)
Copy link
Member

@thaJeztah thaJeztah Apr 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should consider making the delay incremental, e.g.;

time.Sleep((retry + 1) * 10 * time.Millisecond)
  • retry after 10 milliseconds ("happy path" for when the daemon wasn't able to handle it immediately)
  • if failed; add 10 milliseconds (so, retry after 20 milliseconds)
  • if failed; add 10 milliseconds (so, retry after 30)
  • and so on, until we reach 10 attempts (which would be using 100 milliseconds)

If incrementing by 10 is not "aggressive" enough, perhaps we could make it exponential (but with a "cap", which could be 100ms);

10 -> 20 -> 40 -> 80 -> 100 -> 100 -> 100

Doing so would allow the resize to be handled "faster" (in the happy case), while still allowing it to take longer (if the daemon happens to be under load)

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@thaJeztah
Copy link
Member

Hmm.... CI fails on the related test, so wondering if incrementing with 10 is not aggressive enough after all 😞

--- FAIL: TestInitTtySizeErrors (0.65s)
    tty_test.go:29: assertion failed: 
        --- expectedError
        +++ →
        @@ -1,2 +1 @@
        -failed to resize tty, using default size

I some cases, for example if there is a heavy load, the initialization of the TTY size
would fail. This change makes the cli retry 10 times instead of 5 and we wait
incrementally from 10ms to 100ms

Relates to docker#3554

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
@rumpl
Copy link
Member Author

rumpl commented May 2, 2022

It's the other way around, the test is about whether we return that error if we can't resize, to the trick is to sleep enough time in the test for the loop to finish and the function to return and error. It should be good now

@thaJeztah
Copy link
Member

Ah, whoop!

Let's get this one merged 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants