sandbox: Delete store artifacts if stopSandbox fails#1267
Conversation
If stopSandbox fails due to qmp/qemu issues, atleast clean up store artifacts before returning errors. Fixes: kata-containers#1266 Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
|
/test |
| s.Logger().Info("Stopping VM") | ||
| return s.hypervisor.stopSandbox() | ||
| if err := s.hypervisor.stopSandbox(); err != nil { | ||
| s.store.Delete() |
There was a problem hiding this comment.
This does not seem right to cleanup the store when the stopSandbox() fails. I mean doing this from such low level function is not appropriate IMO.
|
@sboeuf : Okay. Then in the case of
|
I'm not sure we want to clean up the artifacts if a stop fails, but if we do, this should happens from
No we cannot do that because the cleanupVM function is specific to the QEMU implementation, while the store is generic to the whole sandbox. |
|
@sboeuf : Assuming I am looking at the right place,it would not error out if the artifacts |
|
@nitkon If that's the case, it should be said explicitly that |
|
Two things ...
|
Well the wrong behavior (
No I didn't say that it was behaving this way. I said that the |
|
@sboeuf not sure if this helps but just FYI the usecase I am trying to solve is where Feb 21 11:15:22 kata1 kata-runtime[80116]: time="2019-02-21T11:15:22.69012585+05:30" level=info msg="Stopping Sandbox" arch=ppc64le command=kill container=eabfef03b97c301db7cc83f59429998045674e324ccd9caf5b9ab2c67f2cee43 name=kata-runtime pid=80116 sandbox=eabfef03b97c301db7cc83f59429998045674e324ccd9caf5b9ab2c67f2cee43 source=virtcontainers subsystem=qemu |
|
@nitkon thanks for pointers. About the reason why the QMP_QUIT failure, I think in this case the agent errors out and exit, causing the VM to exit too, because of the systemd service. Take a look a this, and I think you should not run into this QMP_QUIT issue again. |
|
@nitkon - Please can you run with full debug enabled and attach the proxy log (or agent log if you're using vsock). |
|
@jodh-intel @sboeuf : This does not occur when I run stand alone. Happens only in the CI. I have attached the journalctl logs when the VM panicked. (GUEST_PANICKED) Journalctl logs: LeakyPods.log |
|
@jodh-intel: Are the attached logs good enough Or should I provide anything more.. ? |
|
Ping @nitkon - any update? |
|
/retest |
|
@nitkon any updates? Thx! |
|
@nitkon nudge |
|
Ping @nitkon. |
|
I can confirm if the issue still persists once I successfully get CI running on one of the proxy test PRs. |
|
@nitkon any updates? Thx |
|
@nitkon - sorry if this PR is frustrating -- I know its gone on for a while. I think we are mixing up
(1) is ongoing it seems per #1267 (comment), but (2) is the point of the PR. I agree that it may make sense to do this cleanup higher in the call stack. While stop is failing, do the CRI level implementations still call delete? What about in Docker case? |
|
added "needs-help" -- let's gather some more info first -- @nitkon are you able to drive this? |
|
Hi @egernst, Yes I can drive this. However, I have been unable to reproduce this issue recently and none of the Power CI's has failed due to this issue in recent times. cc @grahamwhaley |
|
@grahamwhaley close this? |
|
@raravena80 - I think as long as @nitkon is OK closing this ... @nitkon , you want to close this? |
|
Ping @nitkon. |
|
Will re-open if it starts to re-occur. Closing it for now. |
If stopSandbox fails due to qmp/qemu issues,
atleast clean up store artifacts before returning
errors.
Fixes: #1266
Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com