Skip to content

[4.6] libct/cg/sd: reconnect and retry on dbus connection error#49

Closed
kolyshkin wants to merge 6 commits intoprojectatomic:rhaos-4.6from
kolyshkin:4.6-dbus-reconn
Closed

[4.6] libct/cg/sd: reconnect and retry on dbus connection error#49
kolyshkin wants to merge 6 commits intoprojectatomic:rhaos-4.6from
kolyshkin:4.6-dbus-reconn

Conversation

@kolyshkin
Copy link
Copy Markdown
Collaborator

This is a semi-manual backport of upstream PR opencontainers/runc#2923 to 4.6 branch.
It's manual due to lots of recent changes missing in this branch, and thus requires a very careful review.

Original PR description follows.


In case the dbus daemon is ever restarted, the connection is no longer valid and every operation
fails. This is a minor concern for short-lived runc, but much more of a issue in case there is
a long-running daemon (e.g. cri-o) is using runc's libcontainer, as the connection is never
retried and the only remedy is to restart the daemon.

The solution to the above is to check the errors returned for dbus: connection closed by user
error, and try to re-connect on that. This is what PR #2862 does.

This is a carry of #2862, implementing the idea of retry-in-place (first described
at opencontainers/runc#2862 (comment) and opencontainers/runc#2862 (comment)) on top of what it does.

For more info, see commit messages as well as #2862.

Fixes:

wzshiming and others added 6 commits April 29, 2021 16:27
[@kolyshkin: documentation nits]

Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit cdbed6f)

[minor merge conflict in systemd/v1.go due to missing upstream commit
b006f4a, in systemd/common.go due to missing 73f22e7.]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Generalize isUnitExists as isDbusError, and use errors.As while at it
(which can handle wrapped errors as well).

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit bacfc2c)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
[@kolyshkin: doc nits, use dbus.ErrClosed and isDbusError]

Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 15fee98)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit 6122bc8)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As the caller of this function just logs the error, it does not make
sense to pass it. Instead, log it (once) and return -1.

This is a preparation for the second user.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
(cherry picked from commit eee425f)

[Minor merge conflict due to missing "return" removed by a hunk from
commit 978fa6e]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Instead of reconnecting to dbus after some failed operations, and
returning an error (so a caller has to retry), reconnect AND retry
in place for all such operations.

This should fix issues caused by a stale dbus connection after e.g.
a dbus daemon restart.

[This is a manual backport of upstream commit 47ef9a1]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin requested a review from haircommander April 29, 2021 23:35
@kolyshkin
Copy link
Copy Markdown
Collaborator Author

wrong repo 🤦🏻

@kolyshkin kolyshkin closed this Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants