-
Notifications
You must be signed in to change notification settings - Fork 886
Diagnostic client #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diagnostic client #2032
Conversation
ddebroy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Had a couple of minor comments. One q: can the remediation be triggered automatically once in a while as part of a "garbage collection sweep" or is that too dangerous?
| if err != nil { | ||
| logrus.WithError(err).Fatalf("The connection failed") | ||
| } | ||
| httpIsOk(resp.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can directly check resp.StatusCode for 200/2xx codes which indicate HTTP OK rather than parsing resp.body for OK: https://golangcode.com/get-the-http-response-status-code/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think can be a good enhancement, right now the server side did not really set any specific error code, but it would be good to implement a more proper api with error codes
diagnose/client/main.go
Outdated
| } | ||
| httpIsOk(resp.Body) | ||
|
|
||
| clsuterPeers := fetchNodePeers(*ipPtr, *portPtr, "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clsuster typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Makefile
Outdated
| @echo "🐳 $@" | ||
| go build -o "bin/dnet-$$GOOS-$$GOARCH" ./cmd/dnet | ||
| go build -o "bin/docker-proxy-$$GOOS-$$GOARCH" ./cmd/proxy | ||
| go build -o "bin/dignosticClient-$$GOOS-$$GOARCH" ./diagnose/client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why put it in /diagnose/ and not /cmd/ ? like the others ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep moved
diagnose/client/main.go
Outdated
| for _, k := range orphanKeys { | ||
| resp, err := http.Get(fmt.Sprintf(deleteEntry, ip, port, network, tableName, k)) | ||
| if err != nil { | ||
| logrus.WithError(err).Fatalf("Failed deleting entry k:%s", k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really want to fatal here ? since it will stop the program, it won't delete the remaining entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, changed in error
84493e3 to
989af94
Compare
Codecov Report
@@ Coverage Diff @@
## master #2032 +/- ##
=========================================
Coverage ? 40.46%
=========================================
Files ? 138
Lines ? 22172
Branches ? 0
=========================================
Hits ? 8971
Misses ? 11884
Partials ? 1317
Continue to review full report at Codecov.
|
|
@fcrisciani can you include a README / Markdown file in this PR? Misty started working on one in docker/docs#5558, and can be used as a starting point |
3ba06ad to
ef512ba
Compare
|
@thaJeztah @ddebroy @vieux can you guys give another pass? |
ddebroy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a couple of minor string nits
cmd/diagnostic/README.md
Outdated
| A message like the following will appear in the Docker host logs: | ||
|
|
||
| ```none | ||
| Starting the diagnose server listening on <port> for commands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the string diagnose be diagnostics ?
cmd/diagnostic/README.md
Outdated
| A message like the following will appear in the Docker host logs: | ||
|
|
||
| ```none | ||
| Disabling the diagnose server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the string diagnose be diagnostics ?
ef512ba to
ed0721b
Compare
ddebroy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@thaJeztah can you also give the final blessing? |
766b46b to
5b9614f
Compare
ddebroy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with a couple of minor comments.
diagnostic/server.go
Outdated
| // IsDebugEnable returns true when the debug is enabled | ||
| func (s *Server) IsDebugEnable() bool { | ||
| // IsDiagnosticEnable returns true when the debug is enabled | ||
| func (s *Server) IsDiagnosticEnable() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: how about changing the name to IsDiagnosticEnable**d** like in controller
func (c *controller) IsDiagnosticEnabled() bool {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
networkdb/networkdb.go
Outdated
| return nil, err | ||
| } | ||
| if entry != nil && entry.deleting { | ||
| return nil, types.NotFoundErrorf("entry not found in table %s with network id %s and key %s", tname, nid, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Masking the existence into a "not found" message may be confusing if this message makes into logs but say the actual deletion does not trigger for a while. How about making the message something along the lines of: "entry found but in deleting state. returning not found" to keep things super clear for support engineers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing in: return nil, types.NotFoundErrorf("entry in table %s network id %s and key %s deleted and pending garbage collection", tname, nid, key)
thaJeztah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments 🤗
cmd/diagnostic/Dockerfile
Outdated
| @@ -0,0 +1,8 @@ | |||
| FROM docker:17.12-rc-dind | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-rc tag 17.12-dind is now available (https://hub.docker.com/_/docker/)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking why we needed the dind for this tool, but this is so that we can use it for older daemons, which do not have this functionality built-in, correct?
Should we have two versions of the image? one "minimal" (just the binary), and one dind?
cmd/diagnostic/Dockerfile
Outdated
|
|
||
| RUN apk add --no-cache curl | ||
|
|
||
| WORKDIR /tool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of installing in /tool, you could just install in /usr/local/bin
cmd/diagnostic/README.md
Outdated
| **Standalone network:** | ||
|
|
||
| ```bash | ||
| $ debugClient -c sd -v -net n8a8ie6tb3wr2e260vxj8ncy4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/debugClient/diagnosticClient/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
cmd/diagnostic/README.md
Outdated
| **Overlay network:** | ||
|
|
||
| ```bash | ||
| $ debugClient -port 2001 -c overlay -v -net n8a8ie6tb3wr2e260vxj8ncy4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/debugClient/diagnosticClient/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
cmd/diagnostic/Dockerfile
Outdated
| WORKDIR /tool | ||
|
|
||
| COPY daemon.json /etc/docker/daemon.json | ||
| COPY diagnosticClient /tool/diagnosticClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a follow-up; we should make this Dockerfile a multi-stage build, and actually build the client in it (I had a branch with that, but looks like I didn't push, and it's not on my laptop 😊
| Remember that table operations have ownership, so any `create entry` will be persistent till | ||
| the diagnostic container is part of the swarm. | ||
|
|
||
| 1. Make sure that the node where the diagnostic client will run is not part of the swarm, if so do `docker swarm leave -f` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need different steps for 17.12+ daemons (as they don't have to leave the swarm, just the diagnostic client to connect to them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the container version so you will run the dind version and you will need to leave the swarm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood; thinking if we should have instructions (also using a containerised version) to use for 17.12 daemons (just bind-mounting /var/run/docker.sock e.g. and connecting to the running daemon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the containerized version of it, as dockereng/network-diagnostic:onlyclient that can be used as it with --net host, no need for the docker.sock
cmd/diagnostic/README.md
Outdated
| 2. To run the container, use a command like the following: | ||
|
|
||
| ```bash | ||
| $ docker container run --name net-diagnostic -d --privileged --network host fcrisciani/network-diagnostic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put the image under an official organization? (dockereng/ or dockercore/ e.g.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
198046b to
2ccffde
Compare
|
Note to myself, after this gets merged the moby vendoring requires code change due to changes in method names |
Align it to the moby/moby external api Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
- the client allows to talk to the diagnostic server and decode the internal values of the overlay and service discovery - the tool also allows to remediate in case of orphans entries - added README Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
2ccffde to
be91c3e
Compare
thaJeztah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but left two suggestions 👍
| `HUP` signal to the PID you found in the previous step. | ||
|
|
||
| ```bash | ||
| kill -HUP <pid-of-dockerd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this could be killall -HUP dockerd
| `HUP` signal to the PID you found in the previous step. | ||
|
|
||
| ```bash | ||
| kill -HUP <pid-of-dockerd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
vieux
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Uh oh!
There was an error while loading. Please reload this page.