Fix hostd resource exhaustion in high load CI by hickeng · Pull Request #6998 · vmware/vic

hickeng · 2017-12-21T01:24:01Z

It's still being validated by other builds, but this is the minimal set of changes that should fix the issue of CI crashing hostd due to memory exhaustion. If I'm correct we have two ways of triggering this:
a. leaked VCHs from Group23 and Group6 delete tests
b. event page size update

This is essentially a revert of the pagesize change and modifies the tests to cleanup correctly.
This DOES NOT address how we handle event storms which the page size change was targeted at; that will need additional work, likely using a page cursor or similar for demand scolling of event history view.

Fixes #6886

The delete tests for vic-machine and vic-machine-service leaks VCHs. For the service it's because the tests deploy VCHs directly that are not cleaned up. For vic-machine base it's because we render the VCH invalid by moving the endpointVM in such a manner that the deletion fails without explicit cleanup after.

This reverts commit 7796336 because it's apparent that increasing the page size to this extent can cause hostd to both hit its resource limits and to drastically fragment its heap.

We were checking for existence of containers before they were created as an artifact of moving the check block prior to the create for the volume existence check. We had a test installing with a named volume store that was not configured during the re-install and therefore not deleted at the end.

mhagen-vmware

Lgtm

* Fix leak of VCHs after test runs The delete tests for vic-machine and vic-machine-service leaks VCHs. For the service it's because the tests deploy VCHs directly that are not cleaned up. For vic-machine base it's because we render the VCH invalid by moving the endpointVM in such a manner that the deletion fails without explicit cleanup after. * Revert "Increase event page size to 1000 (vmware#6937)" This reverts commit 7796336 because it's apparent that increasing the page size to this extent can cause hostd to both hit its resource limits and drastically fragment its heap.

An alternative to increasing the collector page size. It will reduce the throughput to the event collector and hence reduce event misses. See issues #6937 and #6998

* Fix leak of VCHs after test runs The delete tests for vic-machine and vic-machine-service leaks VCHs. For the service it's because the tests deploy VCHs directly that are not cleaned up. For vic-machine base it's because we render the VCH invalid by moving the endpointVM in such a manner that the deletion fails without explicit cleanup after. * Revert "Increase event page size to 1000 (#6937)" This reverts commit 7796336 because it's apparent that increasing the page size to this extent can cause hostd to both hit its resource limits and drastically fragment its heap.

An alternative to increasing the collector page size. It will reduce the throughput to the event collector and hence reduce event misses. See issues #6937 and #6998

An alternative to increasing the collector page size. It will reduce the throughput to the event collector and hence reduce event misses. See issues vmware#6937 and vmware#6998

An alternative to increasing the collector page size. It will reduce the throughput to the event collector and hence reduce event misses. See issues #6937 and #6998

hickeng added 2 commits December 20, 2017 17:14

Revert "Increase event page size to 1000 (vmware#6937)"

ca9cab6

This reverts commit 7796336 because it's apparent that increasing the page size to this extent can cause hostd to both hit its resource limits and to drastically fragment its heap.

vmwclabot added the cla-not-required label Dec 21, 2017

lcastellano approved these changes Dec 21, 2017

View reviewed changes

hickeng changed the title ~~Fix hostd resource exhaustion in high load CI [full ci]~~ Fix hostd resource exhaustion in high load CI [specific ci=Group23-VIC-Machine-Service] Dec 21, 2017

hickeng requested review from cgtexmex and zjs December 21, 2017 18:54

cgtexmex approved these changes Dec 21, 2017

View reviewed changes

mhagen-vmware approved these changes Dec 21, 2017

View reviewed changes

hickeng changed the title ~~Fix hostd resource exhaustion in high load CI [specific ci=Group23-VIC-Machine-Service]~~ Fix hostd resource exhaustion in high load CI Dec 21, 2017

hickeng merged commit 09bfe47 into vmware:master Dec 21, 2017

dougm added a commit that referenced this pull request Dec 21, 2017

Use event types to filter events

af75361

An alternative to increasing the collector page size. It will reduce the throughput to the event collector and hence reduce event misses. See issues #6937 and #6998

dougm mentioned this pull request Dec 21, 2017

[specific ci=1-05-Docker-Start] Use event types to filter events #7000

Closed

dougm added a commit that referenced this pull request Dec 21, 2017

Use event types to filter events

1480708

An alternative to increasing the collector page size. It will reduce the throughput to the event collector and hence reduce event misses. See issues #6937 and #6998

This was referenced Dec 22, 2017

[full ci] Revert govc changes and update event page value #6970

Closed

[specific ci=Group3-Docker-Compose] Revert govc change - Just Test #6966

Closed

hickeng pushed a commit that referenced this pull request Dec 29, 2017

Use event types to filter events

f96e42f

An alternative to increasing the collector page size. It will reduce the throughput to the event collector and hence reduce event misses. See issues #6937 and #6998

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix hostd resource exhaustion in high load CI#6998

Fix hostd resource exhaustion in high load CI#6998
hickeng merged 3 commits intovmware:masterfrom
hickeng:6886

hickeng commented Dec 21, 2017

Uh oh!

mhagen-vmware left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

hickeng commented Dec 21, 2017

Uh oh!

mhagen-vmware left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants