Skip to content

Cherry-pick a variety of changes in 1.4.1 (concurrent exec)#8101

Merged
hickeng merged 4 commits intovmware:releases/1.4.1from
hickeng:7410-1.4.1-pr
Jun 28, 2018
Merged

Cherry-pick a variety of changes in 1.4.1 (concurrent exec)#8101
hickeng merged 4 commits intovmware:releases/1.4.1from
hickeng:7410-1.4.1-pr

Conversation

@hickeng
Copy link
Contributor

@hickeng hickeng commented Jun 27, 2018

Cherry picks:

d4faa97 is included here as it's destined for 1.4.1 being purely a temporary solution to allow zero-modification use of elastic search. It is not directly linked with the rest of the commit but is batched to reduce turn-around time.

The remainder of the commits are mitigation for #7410. As can be seen in the console dump below we get all of the data back, but it is not fast. This is serializing the dispatch of execs (but not the continuing execution of them) to avoid the concurrent modification path with thrashes at this time.

$ c=15;date;id=`docker ps -q | head -n 1`; for i in `seq 1 $c`; do docker exec -t $id /bin/echo test$i &done;for i in `seq 1 $c`;do wait %$i 2>/dev/null;done;date
Wed Jun 27 21:28:47 UTC 2018
test1
test13
test8
test2
test7
test12
test11
test10
test3
test14
test6
test5
test9
test4
Wed Jun 27 21:29:45 UTC 2018

There are two main items to note that still impact this workaround:

  1. There is a timeout on the serialization lock - it's currently using the APItimeout which is set to just slightly longer than the vSphere PropertyCollector timeout to allow those errors to propagate and be handled. This currently works at at effective 3min. After this time we revert to the contentious competition path which will likely result in ConcurrentModification errors.
  2. There appears to be a connection timeout (http or tcp if doing protocol upgrade) that kicks in at around 150s that results in the client connection being dropped even if the server is happily holding it open to return output. It's unclear to me whether this is a client side issue or an issue in the docker routing middleware we use for the user facing API on the server side. This results in connection reset by peer from the client and broken pipe on the personality.

An addendum of note is that vSphere seems to get slower and slower at reconfigure as the number of execs increases. It's not clear if this is directly related to the number of extraconfig keys or whether it's related to the number of reconfigure operations. I am adding a note to #7410 to address this or open a secondary issue for follow up.

[skip unit]
[specific ci=6-15-Syslog]

@hickeng hickeng changed the title Concurrent exec cherry-picks Cherry-pick a variety of changes in 1.4.1 (concurrent exec) Jun 27, 2018
@zjs
Copy link
Member

zjs commented Jun 27, 2018

When you "rebase and merge" this change, the PR number won't be added to the commit summaries by GitHub. You may wish to do that locally.

hickeng added 2 commits June 27, 2018 16:43
The correct approach for this issue is to allow specification of sysctl
options generally and then parse/apply them in the container. This
change is made because:
a. it's extremely quick to do
b. it impacts a common workload (elastic search)
c. after consultation with Photon kernel team there's no known
negative impact to changing the default given cVM model.

(cherry picked from commit 68eef36)
…e#8101)

Adds a recursion property to skip encode or decode of a field. If
skipping both the depth=0 should be used.

Splits DefaultGuestInfoPrefix. It should have only contained the vSphere
mandatory guestinfo prefix but also contained the default VIC "vice"
namespace prefix. As part of this it extracts most string constants used
in the package and turns them into actual constants, and moves those
constants that should be modifiable to be variables. Future work should
extend this to be a defined config structure that Encode and Decode
can operate from.

Updates extraconfig to have a CalculateKey function in addition to the
CalculateKeys function that returns an array. This has been done because
of an observation that by far the most common usage was
CalculateKeys(...)[0], sometimes with length checking of the return but
frequently without. The new method panics if the structure pattern for
the field cannot be found.

(cherry picked from commit 525fbcf)
hickeng added a commit to hickeng/vic that referenced this pull request Jun 27, 2018
…e#8101)

Makes use of extraconfig update for suppressing decode of fields into
existing structures to prevent overwriting of in-memory state updates
during a reload. This is necessary because there's no test-and-set
guarantees between API and guest side updates with guestinfo.
namespacedb would address this at an infrastructure level.

Adds mapping of InvalidState that we can receive when multiple guest
operations collide to a concurrent modification so that a retry can be
attempted by the caller. Handling of guest operations does not trigger
TaskInProgress or ConcurrentModification as we'd expected from the
infrastructure.

Updates the unit tests to use the structure without the suppression of
decoding - the differentiation wasn't important previously but now the
structure handling is asymmetric depending on whether it's tether or API
so the correct pacakge reference is now important.

(cherry picked from commit 014952b)
hickeng added a commit to hickeng/vic that referenced this pull request Jun 27, 2018
There are outstanding issues to address with concurrent exec. This work is
palliative rather than an actual fix.

Removes checking for "started" in the status string - we reliably see this
field not propagating to the property collector despite being logged as
set in the tether. This _only_ applies to execs at this time as that is
the only path calling task.State (via InspectTask).

Adds locking around dispatch of execs, with a timeout, to serialize that
initial dispatch path against a single container. If the timeout expires
it reverts to current behaviour and relies on concurrent modification and
retry.

(cherry picked from commit c99f021)
hickeng added 2 commits June 27, 2018 17:40
…e#8101)

Makes use of extraconfig update for suppressing decode of fields into
existing structures to prevent overwriting of in-memory state updates
during a reload. This is necessary because there's no test-and-set
guarantees between API and guest side updates with guestinfo.
namespacedb would address this at an infrastructure level.

Adds mapping of InvalidState that we can receive when multiple guest
operations collide to a concurrent modification so that a retry can be
attempted by the caller. Handling of guest operations does not trigger
TaskInProgress or ConcurrentModification as we'd expected from the
infrastructure.

Updates the unit tests to use the structure without the suppression of
decoding - the differentiation wasn't important previously but now the
structure handling is asymmetric depending on whether it's tether or API
so the correct pacakge reference is now important.

(cherry picked from commit f907974)
There are outstanding issues to address with concurrent exec. This work is
palliative rather than an actual fix.

Removes checking for "started" in the status string - we reliably see this
field not propagating to the property collector despite being logged as
set in the tether. This _only_ applies to execs at this time as that is
the only path calling task.State (via InspectTask).

Adds locking around dispatch of execs, with a timeout, to serialize that
initial dispatch path against a single container. If the timeout expires
it reverts to current behaviour and relies on concurrent modification and
retry.

(cherry picked from commit c99f021)
@hickeng hickeng merged commit b3a6192 into vmware:releases/1.4.1 Jun 28, 2018
hickeng added a commit that referenced this pull request Jun 28, 2018
The correct approach for this issue is to allow specification of sysctl
options generally and then parse/apply them in the container. This
change is made because:
a. it's extremely quick to do
b. it impacts a common workload (elastic search)
c. after consultation with Photon kernel team there's no known
negative impact to changing the default given cVM model.

(cherry picked from commit 68eef36)
hickeng added a commit that referenced this pull request Jun 28, 2018
Adds a recursion property to skip encode or decode of a field. If
skipping both the depth=0 should be used.

Splits DefaultGuestInfoPrefix. It should have only contained the vSphere
mandatory guestinfo prefix but also contained the default VIC "vice"
namespace prefix. As part of this it extracts most string constants used
in the package and turns them into actual constants, and moves those
constants that should be modifiable to be variables. Future work should
extend this to be a defined config structure that Encode and Decode
can operate from.

Updates extraconfig to have a CalculateKey function in addition to the
CalculateKeys function that returns an array. This has been done because
of an observation that by far the most common usage was
CalculateKeys(...)[0], sometimes with length checking of the return but
frequently without. The new method panics if the structure pattern for
the field cannot be found.

(cherry picked from commit 525fbcf)
hickeng added a commit that referenced this pull request Jun 28, 2018
Makes use of extraconfig update for suppressing decode of fields into
existing structures to prevent overwriting of in-memory state updates
during a reload. This is necessary because there's no test-and-set
guarantees between API and guest side updates with guestinfo.
namespacedb would address this at an infrastructure level.

Adds mapping of InvalidState that we can receive when multiple guest
operations collide to a concurrent modification so that a retry can be
attempted by the caller. Handling of guest operations does not trigger
TaskInProgress or ConcurrentModification as we'd expected from the
infrastructure.

Updates the unit tests to use the structure without the suppression of
decoding - the differentiation wasn't important previously but now the
structure handling is asymmetric depending on whether it's tether or API
so the correct pacakge reference is now important.

(cherry picked from commit f907974)
@hickeng hickeng added the impact/doc/note Requires creation of or changes to an official release note label Jun 29, 2018
@stuclem
Copy link

stuclem commented Sep 17, 2018

Already documented as a resolved issue in https://github.com/vmware/vic/releases/tag/v1.4.3, in the context of #7410.

@stuclem stuclem removed the impact/doc/note Requires creation of or changes to an official release note label Sep 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants