-
Notifications
You must be signed in to change notification settings - Fork 886
Fix leak of watchMiss goroutine #1786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Stacktrace related in 17.03: |
drivers/overlay/ov_network.go
Outdated
| for { | ||
| msgs, err := nlSock.Receive() | ||
| if err != nil { | ||
| if nlSock.GetFd() == -1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs locking to avoid a data race involving nlSock.fd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct thanks
|
|
||
| // Close the netlink socket, this will also release the watchMiss goroutine that is using it | ||
| if n.nlSocket != nil { | ||
| n.nlSocket.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs locking to avoid a data race involving nlSock.fd. Close could race with Receive or GetFd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
network lock is already taken before calling the destroySandbox, this looks like is by design, see the comment on top of the function: https://github.com/docker/libnetwork/blob/master/drivers/overlay/ov_network.go#L267
|
@aaronlehmann can you give another round of review? as this LG for you I will update the backport |
|
|
|
goroutine A is blocked on the Receive on the netlink socket Do you have in mind some other scenario? In this case the lock is the network one that guarantees the safety of the netlink socket. |
goroutine A calls Receive, which checks if s.fd < 0. At this time, s.fd >= 0. This is really a bug in |
|
In this case it is still fine, because you want to avoid the race between the Close and the GetFD (Receive a part from the initial check on the fd will return what the syscall returns so no issue there) and that is done using the network mutex. In general the linux socket are not thread safe, but sure a more clean way would be to protect the fd with the mutex that is inside the netlink library |
|
I opened vishvananda/netlink#234 upstream |
|
will wait for it to be merged and update this commit then |
The netlink socket that was used to monitor the L2 miss was never being closed. The watchMiss goroutine spawned was never returning. This was causing goroutine leak in case of createNetwork/destroyNetwork Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
|
Thanks @fcrisciani and @aaronlehmann LGTM |
|
when repeated operate |
|
@ityangchen this is the follow up PR |
The netlink socket that was used to monitor the L2
miss was never being closed. The watchMiss goroutine
spawned was never returning. This was causing goroutine
leak in case of createNetwork/destroyNetwork
Signed-off-by: Flavio Crisciani flavio.crisciani@docker.com