Bug 1945017: Increase system reserved memory from 1Gi to 1.8Gi to support single node clusters by omertuc · Pull Request #2504 · openshift/machine-config-operator

omertuc · 2021-03-31T14:23:08Z

When running E2E tests on single node clusters, the 1Gi reserved for
system memory is insufficient.

During this PR: #2501 -

I had 3 e2e test runs on AWS single node, the peak recorded system memory
usage during those tests was 1.40, 1.31 and 1.19 GiB respectively. In
this PR I also saw a run that peaked at 1.56 GiB:

The SystemMemoryExceedsReservation alerts demands that the actual usage
would be less than 90% of the amount reserved, so that means the
corresponding thresholds that should be set are at least 1.44, 1.46, 1.32 and
1.74 GiB.

Or in short, the reserved memory should be increased to 1.8GiB to
support single node (with some hopefully sufficient padding).

Possible future improvements -

Different threshold depending on whether the cluster is a single node
cluster or not
Find a way to lower single node system memory usage

openshift-ci-robot · 2021-03-31T14:27:50Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: omertuc
To complete the pull request process, please assign kikisdeliveryservice after the PR has been reviewed.
You can assign the PR to them by writing /assign @kikisdeliveryservice in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2021-03-31T14:31:37Z

@omertuc: This pull request references Bugzilla bug 1945017, which is invalid:

expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1945017: Increase system reserved memory from 1Gi to 1.6Gi to support single node clusters

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2021-03-31T14:34:03Z

@omertuc: This pull request references Bugzilla bug 1945017, which is invalid:

expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1945017: Increase system reserved memory from 1Gi to 1.6Gi to support single node clusters

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2021-03-31T14:34:11Z

@omertuc: This pull request references Bugzilla bug 1945017, which is invalid:

expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1945017: Increase system reserved memory from 1Gi to 1.6Gi to support single node clusters

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

omertuc · 2021-03-31T15:59:51Z

/test e2e-aws-workers-rhel7
/test okd-e2e-aws

kikisdeliveryservice · 2021-03-31T17:13:14Z

@rphillips any issues with this?

@omertuc please update the BZ as bot requested

omertuc · 2021-03-31T21:48:35Z

/bugzilla refresh
/test e2e-aws-serial
/test e2e-vsphere-upgrade
/test e2e-aws-workers-rhel7
/test okd-e2e-aws
/test ?

omertuc · 2021-03-31T21:49:39Z

/bugzilla refresh

omertuc · 2021-03-31T21:49:45Z

/test ?

openshift-ci-robot · 2021-03-31T21:50:15Z

@omertuc: An error was encountered querying GitHub for users with public email (wabouham@redhat.com) for bug 1945017 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message.


Post "http://ghproxy/graphql": dial tcp 172.30.229.2:80: i/o timeout

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

Details

In response to this:

/bugzilla refresh
/test e2e-aws-serial
/test e2e-vsphere-upgrade
/test e2e-aws-workers-rhel7
/test okd-e2e-aws
/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

omertuc · 2021-03-31T21:50:57Z

/test e2e-aws-serial
/test e2e-vsphere-upgrade
/test e2e-aws-workers-rhel7
/test okd-e2e-aws

openshift-ci-robot · 2021-03-31T21:51:17Z

@omertuc: An error was encountered querying GitHub for users with public email (wabouham@redhat.com) for bug 1945017 on the Bugzilla server at https://bugzilla.redhat.com. No known errors were detected, please see the full error message for details.

Full error message.


Post "http://ghproxy/graphql": dial tcp 172.30.229.2:80: i/o timeout

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2021-03-31T21:51:43Z

@omertuc: The following commands are available to trigger jobs:

/test cluster-bootimages
/test e2e-agnostic-upgrade
/test e2e-aws
/test e2e-aws-disruptive
/test e2e-aws-proxy
/test e2e-aws-serial
/test e2e-aws-single-node
/test e2e-aws-workers-rhel7
/test e2e-azure
/test e2e-gcp-op
/test e2e-metal-assisted
/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-openstack
/test e2e-ovirt
/test e2e-ovn-step-registry
/test e2e-vsphere
/test e2e-vsphere-upgrade
/test e2e-vsphere-upi
/test images
/test okd-e2e-aws
/test okd-e2e-gcp-op
/test okd-e2e-upgrade
/test okd-e2e-vsphere
/test okd-images
/test unit
/test verify

Use /test all to run the following jobs:

pull-ci-openshift-machine-config-operator-master-e2e-agnostic-upgrade
pull-ci-openshift-machine-config-operator-master-e2e-aws
pull-ci-openshift-machine-config-operator-master-e2e-aws-serial
pull-ci-openshift-machine-config-operator-master-e2e-aws-workers-rhel7
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
pull-ci-openshift-machine-config-operator-master-e2e-metal-ipi
pull-ci-openshift-machine-config-operator-master-e2e-ovn-step-registry
pull-ci-openshift-machine-config-operator-master-e2e-vsphere-upgrade
pull-ci-openshift-machine-config-operator-master-images
pull-ci-openshift-machine-config-operator-master-okd-e2e-aws
pull-ci-openshift-machine-config-operator-master-okd-images
pull-ci-openshift-machine-config-operator-master-unit
pull-ci-openshift-machine-config-operator-master-verify

Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

omertuc · 2021-03-31T21:51:52Z

/test e2e-aws-single-node

openshift-ci-robot · 2021-03-31T21:51:59Z

@omertuc: The following commands are available to trigger jobs:

/test cluster-bootimages
/test e2e-agnostic-upgrade
/test e2e-aws
/test e2e-aws-disruptive
/test e2e-aws-proxy
/test e2e-aws-serial
/test e2e-aws-single-node
/test e2e-aws-workers-rhel7
/test e2e-azure
/test e2e-gcp-op
/test e2e-metal-assisted
/test e2e-metal-ipi
/test e2e-metal-ipi-ovn-dualstack
/test e2e-openstack
/test e2e-ovirt
/test e2e-ovn-step-registry
/test e2e-vsphere
/test e2e-vsphere-upgrade
/test e2e-vsphere-upi
/test images
/test okd-e2e-aws
/test okd-e2e-gcp-op
/test okd-e2e-upgrade
/test okd-e2e-vsphere
/test okd-images
/test unit
/test verify

Use /test all to run the following jobs:

pull-ci-openshift-machine-config-operator-master-e2e-agnostic-upgrade
pull-ci-openshift-machine-config-operator-master-e2e-aws
pull-ci-openshift-machine-config-operator-master-e2e-aws-serial
pull-ci-openshift-machine-config-operator-master-e2e-aws-workers-rhel7
pull-ci-openshift-machine-config-operator-master-e2e-gcp-op
pull-ci-openshift-machine-config-operator-master-e2e-metal-ipi
pull-ci-openshift-machine-config-operator-master-e2e-ovn-step-registry
pull-ci-openshift-machine-config-operator-master-e2e-vsphere-upgrade
pull-ci-openshift-machine-config-operator-master-images
pull-ci-openshift-machine-config-operator-master-okd-e2e-aws
pull-ci-openshift-machine-config-operator-master-okd-images
pull-ci-openshift-machine-config-operator-master-unit
pull-ci-openshift-machine-config-operator-master-verify

Details

In response to this:

/bugzilla refresh
/test e2e-aws-serial
/test e2e-vsphere-upgrade
/test e2e-aws-workers-rhel7
/test okd-e2e-aws
/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

…ode clusters When running E2E tests on single node clusters, the 1Gi reserved for system memory is insufficient. During this PR: #2501 - I had 3 e2e test runs on AWS single node, the peak recorded system memory usage during those tests was 1.40, 1.31 and 1.19 GiB respectively. In this PR I also saw a run that peaked at 1.56 GiB. The SystemMemoryExceedsReservation alerts demands that the actual usage would be less than 90% of the amount reserved, so that means the corresponding thresholds that should be set are at least 1.44, 1.46, 1.32 and 1.74 GiB. Or in short, the reserved memory should be increased to 1.8GiB to support single node (with some hopefully sufficient padding). Possible future improvements - 1) Different threshold depending on whether the cluster is a single node cluster or not 2) Find a way to lower single node system memory usage

openshift-ci-robot · 2021-04-01T07:43:28Z

@omertuc: This pull request references Bugzilla bug 1945017, which is invalid:

expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1945017: Increase system reserved memory from 1Gi to 1.6Gi to support single node clusters

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

omertuc · 2021-04-01T07:45:13Z

@omertuc please update the BZ as bot requested

@kikisdeliveryservice It has to has to be triaged before I can do that, see comment in bz

omertuc · 2021-04-01T09:01:43Z

/test e2e-aws-workers-rhel7

omertuc · 2021-04-01T09:17:49Z

/test okd-e2e-aws

omertuc · 2021-04-01T09:45:48Z

/test e2e-aws
/test e2e-aws-serial
/test e2e-ovn-step-registry

openshift-ci · 2021-04-01T11:51:45Z

@omertuc: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-single-node	1cdb92755e860ce5142595896b5b509f8e16ed97	link	`/test e2e-aws-single-node`
ci/prow/e2e-agnostic-upgrade	`6948ae0`	link	`/test e2e-agnostic-upgrade`
ci/prow/e2e-vsphere-upgrade	`6948ae0`	link	`/test e2e-vsphere-upgrade`
ci/prow/e2e-aws-workers-rhel7	`6948ae0`	link	`/test e2e-aws-workers-rhel7`
ci/prow/e2e-aws-serial	`6948ae0`	link	`/test e2e-aws-serial`
ci/prow/okd-e2e-aws	`6948ae0`	link	`/test okd-e2e-aws`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

rphillips · 2021-04-01T14:39:40Z

The PR will somehow need to detect if SNO is enabled to set the reserve to 1.8GB

mrunalp · 2021-04-01T15:29:13Z

/hold
The preferred approach would be to bump up the system reserved just for the single node CI jobs.

omertuc · 2021-04-01T19:30:55Z

Closed in favor of openshift/release#17403

openshift-ci-robot · 2021-04-01T19:31:00Z

@omertuc: This pull request references Bugzilla bug 1945017. The bug has been updated to no longer refer to the pull request using the external bug tracker. All external bug links have been closed. The bug has been moved to the NEW state.

Details

In response to this:

Bug 1945017: Increase system reserved memory from 1Gi to 1.8Gi to support single node clusters

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot requested review from kikisdeliveryservice and sinnykumari March 31, 2021 14:27

omertuc changed the title ~~Increase system reserved memory from 1Gi to 1.6Gi to support single node clusters~~ Bug 1945017 - Increase system reserved memory from 1Gi to 1.6Gi to support single node clusters Mar 31, 2021

omertuc changed the title ~~Bug 1945017 - Increase system reserved memory from 1Gi to 1.6Gi to support single node clusters~~ Bug 1945017: Increase system reserved memory from 1Gi to 1.6Gi to support single node clusters Mar 31, 2021

openshift-ci-robot added bugzilla/severity-unspecified Referenced Bugzilla bug's severity is unspecified for the PR. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Mar 31, 2021

kikisdeliveryservice assigned rphillips Mar 31, 2021

omertuc mentioned this pull request Mar 31, 2021

WIP - Try Single node with more system reserved memory #2501

Closed

omertuc changed the title ~~Bug 1945017: Increase system reserved memory from 1Gi to 1.6Gi to support single node clusters~~ Bug 1945017: Increase system reserved memory from 1Gi to 1.8Gi to support single node clusters Apr 1, 2021

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 1, 2021

omertuc closed this Apr 1, 2021

Conversation

omertuc commented Mar 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Mar 31, 2021

Uh oh!

openshift-ci-robot commented Mar 31, 2021

Uh oh!

openshift-ci-robot commented Mar 31, 2021

Uh oh!

openshift-ci-robot commented Mar 31, 2021

Uh oh!

omertuc commented Mar 31, 2021

Uh oh!

kikisdeliveryservice commented Mar 31, 2021

Uh oh!

omertuc commented Mar 31, 2021

Uh oh!

omertuc commented Mar 31, 2021

Uh oh!

omertuc commented Mar 31, 2021

Uh oh!

openshift-ci-robot commented Mar 31, 2021

Uh oh!

omertuc commented Mar 31, 2021

Uh oh!

openshift-ci-robot commented Mar 31, 2021

Uh oh!

openshift-ci-robot commented Mar 31, 2021

Uh oh!

omertuc commented Mar 31, 2021

Uh oh!

openshift-ci-robot commented Mar 31, 2021

Uh oh!

openshift-ci-robot commented Apr 1, 2021

Uh oh!

omertuc commented Apr 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

omertuc commented Apr 1, 2021

Uh oh!

omertuc commented Apr 1, 2021

Uh oh!

omertuc commented Apr 1, 2021

Uh oh!

openshift-ci Bot commented Apr 1, 2021

Uh oh!

rphillips commented Apr 1, 2021

Uh oh!

mrunalp commented Apr 1, 2021

Uh oh!

omertuc commented Apr 1, 2021

Uh oh!

openshift-ci-robot commented Apr 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

omertuc commented Mar 31, 2021 •

edited

Loading

omertuc commented Apr 1, 2021 •

edited

Loading