Skip to content

Fix agent update err#1584

Closed
LK4D4 wants to merge 2 commits into
moby:masterfrom
LK4D4:fix_agent_update_err
Closed

Fix agent update err#1584
LK4D4 wants to merge 2 commits into
moby:masterfrom
LK4D4:fix_agent_update_err

Conversation

@LK4D4
Copy link
Copy Markdown
Contributor

@LK4D4 LK4D4 commented Sep 29, 2016

ping @mrjana
With this patch docker properly recovers from transient session failures.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Sep 29, 2016

Current coverage is 54.29% (diff: 92.30%)

Merging #1584 into master will increase coverage by 0.22%

@@             master      #1584   diff @@
==========================================
  Files            84         84          
  Lines         13847      13857    +10   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits           7487       7523    +36   
+ Misses         5356       5331    -25   
+ Partials       1004       1003     -1   

Sunburst

Powered by Codecov. Last update 8f8099a...7fbe71b

@thaJeztah
Copy link
Copy Markdown
Member

Do we need to consider this one for 1.12.2?

Comment thread agent/reporter.go Outdated
go func() {
sr.cond.Wait()
close(condCh)
}()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be in a goroutine?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it doesn't need to be.

@aaronlehmann aaronlehmann added this to the 1.12.2 milestone Sep 29, 2016
@LK4D4 LK4D4 force-pushed the fix_agent_update_err branch from bdc827e to b5bebe9 Compare September 29, 2016 13:30
In case of some problems (i.e. session initiate timeout) this loops
starts to iterate on insane speed and closing sessions like there is no
tomorrow and totally destroys agent. Adding some delay between retries
gives agent some time to recreate session and recover from problem.

Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
<-session.errs branch of event loop should correctly set backoff and
close session. Just closing session won't increase backof and might lead
to fast retry loops.

Signed-off-by: Alexander Morozov <lk4d4math@gmail.com>
@LK4D4 LK4D4 force-pushed the fix_agent_update_err branch from b5bebe9 to 7fbe71b Compare September 29, 2016 13:35
@mavenugo
Copy link
Copy Markdown
Contributor

@thaJeztah yes. this should go in for 1.12.2-rc2

@LK4D4
Copy link
Copy Markdown
Contributor Author

LK4D4 commented Sep 29, 2016

However, docker integration tests fail with this.

@LK4D4
Copy link
Copy Markdown
Contributor Author

LK4D4 commented Sep 29, 2016

Actually, tests failures are invalid tests behavior.

@LK4D4
Copy link
Copy Markdown
Contributor Author

LK4D4 commented Sep 29, 2016

@aaronlehmann PTAL

@aaronlehmann
Copy link
Copy Markdown
Collaborator

Hm, I'm not very familiar with the agent code, but I thought there was already session backoff logic that was supposed to handle this.

@aaronlehmann
Copy link
Copy Markdown
Collaborator

ping @stevvooe

@LK4D4
Copy link
Copy Markdown
Contributor Author

LK4D4 commented Sep 29, 2016

@aaronlehmann there is no session backoff if you just close session without sending error to session.errs. Also, reporter is just for loop which does not care about session or anything.

@LK4D4
Copy link
Copy Markdown
Contributor Author

LK4D4 commented Sep 29, 2016

This fix is way too complicated and probably wrong.

@LK4D4 LK4D4 closed this Sep 29, 2016
@LK4D4 LK4D4 deleted the fix_agent_update_err branch September 29, 2016 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants