Remove raise_on_failure from wait calls#93
Conversation
|
@blackboxsw @lucasmoura @paride , this would be a breaking change, but I think it is necessary. Given the number of places For the places relying on a |
|
@TheRealFalcon thanks for the heads-up, works for the bootspeed jobs. |
| try: | ||
| super()._wait_for_cloudinit( | ||
| raise_on_failure=True) | ||
| super()._wait_for_cloudinit() |
There was a problem hiding this comment.
I don't think this code will work anymore, right ? Since _wait_for_cloudinit will not raise the OSError anymore, we will never properly wait for cloud-init status to be over.
Also, I am little bit worried about this change, because even though we can adapt the uaclient code to have that explicit behavior of checking the cloud-init status return, any user that will need to launch a VM will need to know beforehand that they should wait for the lxd-agent to be ready. This means they will need to replicate this for loop logic on their code, which I don't think it is optimal.
There was a problem hiding this comment.
I thought the OSError was coming from LXD itself before we called cloud-init status, which means we would continue to get the error even after this change. Is that not true? I tried testing locally, but I'm actually not seeing any of these errors anymore for LXD VMs? Am I just lucky or did these issues maybe get resolved and it's all a moot point anyway 😄 ?
There was a problem hiding this comment.
Those are good questions. Let me reproduce that change and run some ua tests so I can test what happens in that scenario
There was a problem hiding this comment.
@TheRealFalcon I think I know what is happening on the lxd vms. I think we are not hitting the lxd-agent issue because we are actually treating that already on the _wait_for_execute method. There, it tries to run a whoami command and if we hit the lxd-agentissue, we will sleep and try the command again. Therefore, we will have enough time for the lxd-agent to be setup.
However, regarding the OSError issue, the _wait_for_cloudinit method is not raising it. For example, if we comment the _wait_for_execute function and try to launch a lxd vm., we will see that the _wait_for_cloudinit command will fail because of the lxd-agent and it will not raise an OSError, meaning that the lxd loop we have will not work
|
@lucasmoura Just to double check, since the |
|
@TheRealFalcon I think we just need a couple of change before merging it:
Also, @blackboxsw do you mind reviewing this too ? Just to see if I am not missing anything ? I have run a uaclient vm test with those changes and also without the |
|
@lucasmoura Agree on point 1. On point 2,
I thought about that too, but also thought it might be a little unintuitive to receive a |
|
I don't see a problem with the But I think @OddBloke and @blackboxsw opinions on the topic will be really useful on deciding that. |
|
@lucasmoura I responded to the "return Result" aspect of your proposal, but didn't mention anything about the "make it public" part, but it looks like the outcome of the latter feeds into the former. If these other methods (start/restart/etc) are already calling |
|
@TheRealFalcon I thought about cases where users want to guarantee that Looking back to it, I think having a supported way to see the status of cloud-init will not be that much useful. So I am okay with the |
66fa68c to
04d3fa4
Compare
The 'raise_on_failure' aspect of waiting for cloud-init to start is causing a fair amount of additional code and workarounds. It was introduced as an optional parameter as to not break the existing API, but given the amount of heartburn it's causing, it seems best to remove it entirely. If one needs the original behavior, checking cloud-init status via an execute call should suffice.
04d3fa4 to
87ee40b
Compare
The 'raise_on_failure' aspect of waiting for cloud-init to start is
causing a fair amount of additional code and workarounds. It was
introduced as an optional parameter as to not break the existing
API, but given the amount of heartburn it's causing, it seems best to
remove it entirely. If one needs the original behavior, checking cloud-init
status via an execute call should suffice.