Cherry-pick a variety of changes in 1.4.1 (concurrent exec)#8101
Merged
hickeng merged 4 commits intovmware:releases/1.4.1from Jun 28, 2018
Merged
Cherry-pick a variety of changes in 1.4.1 (concurrent exec)#8101hickeng merged 4 commits intovmware:releases/1.4.1from
hickeng merged 4 commits intovmware:releases/1.4.1from
Conversation
Member
|
When you "rebase and merge" this change, the PR number won't be added to the commit summaries by GitHub. You may wish to do that locally. |
zjs
approved these changes
Jun 27, 2018
lcastellano
approved these changes
Jun 27, 2018
The correct approach for this issue is to allow specification of sysctl options generally and then parse/apply them in the container. This change is made because: a. it's extremely quick to do b. it impacts a common workload (elastic search) c. after consultation with Photon kernel team there's no known negative impact to changing the default given cVM model. (cherry picked from commit 68eef36)
…e#8101) Adds a recursion property to skip encode or decode of a field. If skipping both the depth=0 should be used. Splits DefaultGuestInfoPrefix. It should have only contained the vSphere mandatory guestinfo prefix but also contained the default VIC "vice" namespace prefix. As part of this it extracts most string constants used in the package and turns them into actual constants, and moves those constants that should be modifiable to be variables. Future work should extend this to be a defined config structure that Encode and Decode can operate from. Updates extraconfig to have a CalculateKey function in addition to the CalculateKeys function that returns an array. This has been done because of an observation that by far the most common usage was CalculateKeys(...)[0], sometimes with length checking of the return but frequently without. The new method panics if the structure pattern for the field cannot be found. (cherry picked from commit 525fbcf)
hickeng
added a commit
to hickeng/vic
that referenced
this pull request
Jun 27, 2018
…e#8101) Makes use of extraconfig update for suppressing decode of fields into existing structures to prevent overwriting of in-memory state updates during a reload. This is necessary because there's no test-and-set guarantees between API and guest side updates with guestinfo. namespacedb would address this at an infrastructure level. Adds mapping of InvalidState that we can receive when multiple guest operations collide to a concurrent modification so that a retry can be attempted by the caller. Handling of guest operations does not trigger TaskInProgress or ConcurrentModification as we'd expected from the infrastructure. Updates the unit tests to use the structure without the suppression of decoding - the differentiation wasn't important previously but now the structure handling is asymmetric depending on whether it's tether or API so the correct pacakge reference is now important. (cherry picked from commit 014952b)
hickeng
added a commit
to hickeng/vic
that referenced
this pull request
Jun 27, 2018
There are outstanding issues to address with concurrent exec. This work is palliative rather than an actual fix. Removes checking for "started" in the status string - we reliably see this field not propagating to the property collector despite being logged as set in the tether. This _only_ applies to execs at this time as that is the only path calling task.State (via InspectTask). Adds locking around dispatch of execs, with a timeout, to serialize that initial dispatch path against a single container. If the timeout expires it reverts to current behaviour and relies on concurrent modification and retry. (cherry picked from commit c99f021)
…e#8101) Makes use of extraconfig update for suppressing decode of fields into existing structures to prevent overwriting of in-memory state updates during a reload. This is necessary because there's no test-and-set guarantees between API and guest side updates with guestinfo. namespacedb would address this at an infrastructure level. Adds mapping of InvalidState that we can receive when multiple guest operations collide to a concurrent modification so that a retry can be attempted by the caller. Handling of guest operations does not trigger TaskInProgress or ConcurrentModification as we'd expected from the infrastructure. Updates the unit tests to use the structure without the suppression of decoding - the differentiation wasn't important previously but now the structure handling is asymmetric depending on whether it's tether or API so the correct pacakge reference is now important. (cherry picked from commit f907974)
There are outstanding issues to address with concurrent exec. This work is palliative rather than an actual fix. Removes checking for "started" in the status string - we reliably see this field not propagating to the property collector despite being logged as set in the tether. This _only_ applies to execs at this time as that is the only path calling task.State (via InspectTask). Adds locking around dispatch of execs, with a timeout, to serialize that initial dispatch path against a single container. If the timeout expires it reverts to current behaviour and relies on concurrent modification and retry. (cherry picked from commit c99f021)
hickeng
added a commit
that referenced
this pull request
Jun 28, 2018
The correct approach for this issue is to allow specification of sysctl options generally and then parse/apply them in the container. This change is made because: a. it's extremely quick to do b. it impacts a common workload (elastic search) c. after consultation with Photon kernel team there's no known negative impact to changing the default given cVM model. (cherry picked from commit 68eef36)
hickeng
added a commit
that referenced
this pull request
Jun 28, 2018
Adds a recursion property to skip encode or decode of a field. If skipping both the depth=0 should be used. Splits DefaultGuestInfoPrefix. It should have only contained the vSphere mandatory guestinfo prefix but also contained the default VIC "vice" namespace prefix. As part of this it extracts most string constants used in the package and turns them into actual constants, and moves those constants that should be modifiable to be variables. Future work should extend this to be a defined config structure that Encode and Decode can operate from. Updates extraconfig to have a CalculateKey function in addition to the CalculateKeys function that returns an array. This has been done because of an observation that by far the most common usage was CalculateKeys(...)[0], sometimes with length checking of the return but frequently without. The new method panics if the structure pattern for the field cannot be found. (cherry picked from commit 525fbcf)
hickeng
added a commit
that referenced
this pull request
Jun 28, 2018
Makes use of extraconfig update for suppressing decode of fields into existing structures to prevent overwriting of in-memory state updates during a reload. This is necessary because there's no test-and-set guarantees between API and guest side updates with guestinfo. namespacedb would address this at an infrastructure level. Adds mapping of InvalidState that we can receive when multiple guest operations collide to a concurrent modification so that a retry can be attempted by the caller. Handling of guest operations does not trigger TaskInProgress or ConcurrentModification as we'd expected from the infrastructure. Updates the unit tests to use the structure without the suppression of decoding - the differentiation wasn't important previously but now the structure handling is asymmetric depending on whether it's tether or API so the correct pacakge reference is now important. (cherry picked from commit f907974)
1 task
|
Already documented as a resolved issue in https://github.com/vmware/vic/releases/tag/v1.4.3, in the context of #7410. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry picks:
d4faa97 is included here as it's destined for 1.4.1 being purely a temporary solution to allow zero-modification use of elastic search. It is not directly linked with the rest of the commit but is batched to reduce turn-around time.
The remainder of the commits are mitigation for #7410. As can be seen in the console dump below we get all of the data back, but it is not fast. This is serializing the dispatch of execs (but not the continuing execution of them) to avoid the concurrent modification path with thrashes at this time.
There are two main items to note that still impact this workaround:
connection reset by peerfrom the client andbroken pipeon the personality.An addendum of note is that vSphere seems to get slower and slower at reconfigure as the number of execs increases. It's not clear if this is directly related to the number of extraconfig keys or whether it's related to the number of reconfigure operations. I am adding a note to #7410 to address this or open a secondary issue for follow up.