Inhibit issues with concurrent execs#8098
Conversation
9a6e0ba to
c554f91
Compare
| // we special case this for now as we _presume_ that it's either a conflict | ||
| // of reconfigure with guest operation, or a conflict with another guest operation. | ||
| // These cases do not return a TaskInProgress or concurrent modification. | ||
| return ConcurrentAccessError{errors.New("invalid state from start guest program")} |
There was a problem hiding this comment.
Would there be any benefit to logging err or including it in the new error being constructed here? (E.g., to allow us to use log data to validate the presumption described in the comment above if something doesn't seem to be working quite as expected.)
lib/tether/config.go
Outdated
| IP *net.IPNet `vic:"0.1" scope:"read-only" key:"ip"` | ||
|
|
||
| // Actual IP address assigned | ||
| // TODO: should this skip decode? - if it's used to seed DHCP requests for semi-stable addressing then |
There was a problem hiding this comment.
Should we file a ticket to track this TODO?
There was a problem hiding this comment.
Will remove the TODO - with the benefit of time I can recall that the non-persistent keys will never be written from the API side while the VM is powered on. This would only become necessary if we were to change that AND had a reason to update this field from the API side (I cannot think of one).
| } | ||
|
|
||
| // IsInvalidStateError is an error certifier function for errors coming back from vsphere. It checks for an InvalidStateFault | ||
| func IsInvalidStateError(err error) bool { |
There was a problem hiding this comment.
Is there code that should be updated to call this now that it exists? (If so, that should probably be handled in a separate PR.)
There was a problem hiding this comment.
There is, for example lib/portlayer/storage/vsphere/toolbox_common.go. Opened #8099 for it.
| // IsInvalidStateError is an error certifier function for errors coming back from vsphere. It checks for an InvalidStateFault | ||
| func IsInvalidStateError(err error) bool { | ||
| if soap.IsVimFault(err) { | ||
| _, ok1 := soap.ToVimFault(err).(*types.InvalidState) |
There was a problem hiding this comment.
In the other similar methods, we also check for the *types.____Fault here. Is there a reason we don't check for types.InvalidStateFault in this case? (If so, that would be good to document in a comment here so that future readers don't assume it's a mistake.)
There was a problem hiding this comment.
iirc types.InvalidStateFault does not implement GetMethodFault() method so it won't compile. I confess I've not dug into why this one is not the same as the others. @dougm ?
| return ok1 || ok2 || soap.ToSoapFault(err).String == "vim.fault.InvalidPowerState" || | ||
| soap.ToSoapFault(err).String == "vim.fault.InvalidPowerState" | ||
| } | ||
| return false |
There was a problem hiding this comment.
This whole method seems to be basically repeated many times. Would there be value in defining a helper that takes the four "variable" pieces (two interfaces, two strings) as arguments? (If so, that should probably be handled in a separate PR.)
There was a problem hiding this comment.
The entire thing should be in govmomi. I think it would end up being either significantly rewritten to use reflection (viable but much harder to read) or would take four types as the pointers are separate types in this case.
I'd prefer just to move it all to govmomi and make it @dougm's problem ;) Opened #8099 for follow up.
6d40464 to
6950f83
Compare
|
1-09-Docker-Attach Attach with short input Stderr: Personna: Questions:
|
d5c63bd to
f6c176f
Compare
Makes use of extraconfig update for suppressing decode of fields into existing structures to prevent overwriting of in-memory state updates during a reload. This is necessary because there's no test-and-set guarantees between API and guest side updates with guestinfo. namespacedb would address this at an infrastructure level. Adds mapping of InvalidState that we can receive when multiple guest operations collide to a concurrent modification so that a retry can be attempted by the caller. Handling of guest operations does not trigger TaskInProgress or ConcurrentModification as we'd expected from the infrastructure. Updates the unit tests to use the structure without the suppression of decoding - the differentiation wasn't important previously but now the structure handling is asymmetric depending on whether it's tether or API so the correct pacakge reference is now important. (cherry picked from commit 5271ea7)
There are outstanding issues to address with concurrent exec. This work is palliative rather than an actual fix. Removes checking for "started" in the status string - we reliably see this field not propagating to the property collector despite being logged as set in the tether. This _only_ applies to execs at this time as that is the only path calling task.State (via InspectTask). Adds locking around dispatch of execs, with a timeout, to serialize that initial dispatch path against a single container. If the timeout expires it reverts to current behaviour and relies on concurrent modification and retry. (cherry picked from commit f6c176f)
Makes use of extraconfig update for suppressing decode of fields into existing structures to prevent overwriting of in-memory state updates during a reload. This is necessary because there's no test-and-set guarantees between API and guest side updates with guestinfo. namespacedb would address this at an infrastructure level. Adds mapping of InvalidState that we can receive when multiple guest operations collide to a concurrent modification so that a retry can be attempted by the caller. Handling of guest operations does not trigger TaskInProgress or ConcurrentModification as we'd expected from the infrastructure. Updates the unit tests to use the structure without the suppression of decoding - the differentiation wasn't important previously but now the structure handling is asymmetric depending on whether it's tether or API so the correct pacakge reference is now important.
There are outstanding issues to address with concurrent exec. This work is palliative rather than an actual fix. Removes checking for "started" in the status string - we reliably see this field not propagating to the property collector despite being logged as set in the tether. This _only_ applies to execs at this time as that is the only path calling task.State (via InspectTask). Adds locking around dispatch of execs, with a timeout, to serialize that initial dispatch path against a single container. If the timeout expires it reverts to current behaviour and relies on concurrent modification and retry.
…e#8101) Makes use of extraconfig update for suppressing decode of fields into existing structures to prevent overwriting of in-memory state updates during a reload. This is necessary because there's no test-and-set guarantees between API and guest side updates with guestinfo. namespacedb would address this at an infrastructure level. Adds mapping of InvalidState that we can receive when multiple guest operations collide to a concurrent modification so that a retry can be attempted by the caller. Handling of guest operations does not trigger TaskInProgress or ConcurrentModification as we'd expected from the infrastructure. Updates the unit tests to use the structure without the suppression of decoding - the differentiation wasn't important previously but now the structure handling is asymmetric depending on whether it's tether or API so the correct pacakge reference is now important. (cherry picked from commit 014952b)
There are outstanding issues to address with concurrent exec. This work is palliative rather than an actual fix. Removes checking for "started" in the status string - we reliably see this field not propagating to the property collector despite being logged as set in the tether. This _only_ applies to execs at this time as that is the only path calling task.State (via InspectTask). Adds locking around dispatch of execs, with a timeout, to serialize that initial dispatch path against a single container. If the timeout expires it reverts to current behaviour and relies on concurrent modification and retry. (cherry picked from commit c99f021)
Makes use of extraconfig update for suppressing decode of fields into existing structures to prevent overwriting of in-memory state updates during a reload. This is necessary because there's no test-and-set guarantees between API and guest side updates with guestinfo. namespacedb would address this at an infrastructure level. Adds mapping of InvalidState that we can receive when multiple guest operations collide to a concurrent modification so that a retry can be attempted by the caller. Handling of guest operations does not trigger TaskInProgress or ConcurrentModification as we'd expected from the infrastructure. Updates the unit tests to use the structure without the suppression of decoding - the differentiation wasn't important previously but now the structure handling is asymmetric depending on whether it's tether or API so the correct pacakge reference is now important.
There are outstanding issues to address with concurrent exec. This work is palliative rather than an actual fix. Removes checking for "started" in the status string - we reliably see this field not propagating to the property collector despite being logged as set in the tether. This _only_ applies to execs at this time as that is the only path calling task.State (via InspectTask). Adds locking around dispatch of execs, with a timeout, to serialize that initial dispatch path against a single container. If the timeout expires it reverts to current behaviour and relies on concurrent modification and retry.
…e#8101) Makes use of extraconfig update for suppressing decode of fields into existing structures to prevent overwriting of in-memory state updates during a reload. This is necessary because there's no test-and-set guarantees between API and guest side updates with guestinfo. namespacedb would address this at an infrastructure level. Adds mapping of InvalidState that we can receive when multiple guest operations collide to a concurrent modification so that a retry can be attempted by the caller. Handling of guest operations does not trigger TaskInProgress or ConcurrentModification as we'd expected from the infrastructure. Updates the unit tests to use the structure without the suppression of decoding - the differentiation wasn't important previously but now the structure handling is asymmetric depending on whether it's tether or API so the correct pacakge reference is now important. (cherry picked from commit f907974)
There are outstanding issues to address with concurrent exec. This work is palliative rather than an actual fix. Removes checking for "started" in the status string - we reliably see this field not propagating to the property collector despite being logged as set in the tether. This _only_ applies to execs at this time as that is the only path calling task.State (via InspectTask). Adds locking around dispatch of execs, with a timeout, to serialize that initial dispatch path against a single container. If the timeout expires it reverts to current behaviour and relies on concurrent modification and retry. (cherry picked from commit c99f021)
Makes use of extraconfig update for suppressing decode of fields into existing structures to prevent overwriting of in-memory state updates during a reload. This is necessary because there's no test-and-set guarantees between API and guest side updates with guestinfo. namespacedb would address this at an infrastructure level. Adds mapping of InvalidState that we can receive when multiple guest operations collide to a concurrent modification so that a retry can be attempted by the caller. Handling of guest operations does not trigger TaskInProgress or ConcurrentModification as we'd expected from the infrastructure. Updates the unit tests to use the structure without the suppression of decoding - the differentiation wasn't important previously but now the structure handling is asymmetric depending on whether it's tether or API so the correct pacakge reference is now important. (cherry picked from commit f907974)
There are outstanding issues to address with concurrent exec. This work is palliative rather than an actual fix. Removes checking for "started" in the status string - we reliably see this field not propagating to the property collector despite being logged as set in the tether. This _only_ applies to execs at this time as that is the only path calling task.State (via InspectTask). Adds locking around dispatch of execs, with a timeout, to serialize that initial dispatch path against a single container. If the timeout expires it reverts to current behaviour and relies on concurrent modification and retry. (cherry picked from commit c99f021)
Makes use of extraconfig update for suppressing decode of fields into existing structures to prevent overwriting of in-memory state updates during a reload. This is necessary because there's no test-and-set guarantees between API and guest side updates with guestinfo. namespacedb would address this at an infrastructure level. Adds mapping of InvalidState that we can receive when multiple guest operations collide to a concurrent modification so that a retry can be attempted by the caller. Handling of guest operations does not trigger TaskInProgress or ConcurrentModification as we'd expected from the infrastructure. Updates the unit tests to use the structure without the suppression of decoding - the differentiation wasn't important previously but now the structure handling is asymmetric depending on whether it's tether or API so the correct pacakge reference is now important.
This PR contains two sets of changes:
Towards #7410
Investigating CI failures: