Fix agent update err#1584
Conversation
Current coverage is 54.29% (diff: 92.30%)@@ master #1584 diff @@
==========================================
Files 84 84
Lines 13847 13857 +10
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 7487 7523 +36
+ Misses 5356 5331 -25
+ Partials 1004 1003 -1
|
|
Do we need to consider this one for 1.12.2? |
| go func() { | ||
| sr.cond.Wait() | ||
| close(condCh) | ||
| }() |
There was a problem hiding this comment.
Why does this need to be in a goroutine?
There was a problem hiding this comment.
Yeah, it doesn't need to be.
bdc827e to
b5bebe9
Compare
In case of some problems (i.e. session initiate timeout) this loops starts to iterate on insane speed and closing sessions like there is no tomorrow and totally destroys agent. Adding some delay between retries gives agent some time to recreate session and recover from problem. Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
<-session.errs branch of event loop should correctly set backoff and close session. Just closing session won't increase backof and might lead to fast retry loops. Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
b5bebe9 to
7fbe71b
Compare
|
@thaJeztah yes. this should go in for 1.12.2-rc2 |
|
However, docker integration tests fail with this. |
|
Actually, tests failures are invalid tests behavior. |
|
@aaronlehmann PTAL |
|
Hm, I'm not very familiar with the agent code, but I thought there was already session backoff logic that was supposed to handle this. |
|
ping @stevvooe |
|
@aaronlehmann there is no session backoff if you just close session without sending error to |
|
This fix is way too complicated and probably wrong. |
ping @mrjana
With this patch docker properly recovers from transient session failures.