Skip to content

ci-operator/templates/openshift: m4.xlarge compute nodes#4027

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
wking:larger-workers
Jun 11, 2019
Merged

ci-operator/templates/openshift: m4.xlarge compute nodes#4027
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
wking:larger-workers

Conversation

@wking
Copy link
Copy Markdown
Member

@wking wking commented Jun 11, 2019

Prometheus starting making memory requests with openshift/prometheus-operator@cda68a3f (openshift/prometheus-operator#30):

$ diff -u <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/591/artifacts/e2e-aws-serial/pods.json | jq '.items[] | select(.metadata.name | contains("prometheus")) | {name: .metadata.name, resources: [.spec.containers[].resources | select((. | length) > 0)]}') <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/592/artifacts/e2e-aws-serial/pods.json | jq '.items[] | select(.metadata.name | contains("prometheus")) | {name: .metadata.name, resources: [.spec.containers[].resources | select((. | length) > 0)]}')
--- /dev/fd/63    2019-06-04 14:10:31.908436038 -0700
+++ /dev/fd/62    2019-06-04 14:10:31.908436038 -0700
@@ -1,5 +1,5 @@
{
-  "name": "prometheus-adapter-5f78cc955d-2899k",
+  "name": "prometheus-adapter-64f4f64b7-pvmhn",
  "resources": [
    {
      "requests": {
@@ -10,7 +10,7 @@
  ]
}
{
-  "name": "prometheus-adapter-5f78cc955d-2rlnx",
+  "name": "prometheus-adapter-64f4f64b7-tgnld",
  "resources": [
    {
      "requests": {
@@ -22,14 +22,56 @@
}
{
  "name": "prometheus-k8s-0",
-  "resources": []
+  "resources": [
+    {
+      "limits": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      },
+      "requests": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      }
+    },
+    {
+      "limits": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      },
+      "requests": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      }
+    }
+  ]
}
{
  "name": "prometheus-k8s-1",
-  "resources": []
+  "resources": [
+    {
+      "limits": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      },
+      "requests": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      }
+    },
+    {
+      "limits": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      },
+      "requests": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      }
+    }
+  ]
}
{
-  "name": "prometheus-operator-68f7b6bd55-hmqtj",
+  "name": "prometheus-operator-d8745bf44-l9khn",
  "resources": [
    {
      "requests": {

With that change, our nodes no longer satisfied the assumptions that the SchedulerPreemption tests make about the schedule load on test nodes (i.e. less than 40% of capacity is scheduled). openshift/origin@13b6d0e4a7 (openshift/origin#23029) disabled the test, but this change takes the alternative temporary workaround of bumping our node capacity to re-satisfy the existing test's assumptions.

We have sufficient capacity for doubling our xlarge consumption:

$ export AWS_PROFILE=ci
$ aws --region us-east-1 support describe-trusted-advisor-checks --language en --query "checks[? category == 'service_limits'].{id: @.id, name: @.name}" --output text | grep 'EC2 On-Demand Instances'
0Xc6LMYG8P   EC2 On-Demand Instances
$ AWS_PROFILE=ci aws --region us-east-1 support describe-trusted-advisor-check-result --check-id 0Xc6LMYG8P --query "join(\`\\n\`, result.flaggedResources[].join(\`\\t\`, [@.metadata[4] || '0', @.metadata[3], @.region || '-', '0Xc6LMYG8P', @.metadata[2]]))" --output text
91  3000  us-east-1  0Xc6LMYG8P  On-Demand instances - m4.large
97  3000  us-east-1  0Xc6LMYG8P  On-Demand instances - m4.xlarge

Prometheus starting making memory requests with
openshift/prometheus-operator@cda68a3f (Merge pull request
openshift/prometheus-operator#30 from paulfantom/merge-release-0.30.1,
2019-06-04):

  $ diff -u <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/591/artifacts/e2e-aws-serial/pods.json | jq '.items[] | select(.metadata.name | contains("prometheus")) | {name: .metadata.name, resources: [.spec.containers[].resources | select((. | length) > 0)]}') <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/592/artifacts/e2e-aws-serial/pods.json | jq '.items[] | select(.metadata.name | contains("prometheus")) | {name: .metadata.name, resources: [.spec.containers[].resources | select((. | length) > 0)]}')
  --- /dev/fd/63    2019-06-04 14:10:31.908436038 -0700
  +++ /dev/fd/62    2019-06-04 14:10:31.908436038 -0700
  @@ -1,5 +1,5 @@
  {
  -  "name": "prometheus-adapter-5f78cc955d-2899k",
  +  "name": "prometheus-adapter-64f4f64b7-pvmhn",
    "resources": [
      {
        "requests": {
  @@ -10,7 +10,7 @@
    ]
  }
  {
  -  "name": "prometheus-adapter-5f78cc955d-2rlnx",
  +  "name": "prometheus-adapter-64f4f64b7-tgnld",
    "resources": [
      {
        "requests": {
  @@ -22,14 +22,56 @@
  }
  {
    "name": "prometheus-k8s-0",
  -  "resources": []
  +  "resources": [
  +    {
  +      "limits": {
  +        "cpu": "100m",
  +        "memory": "25Mi"
  +      },
  +      "requests": {
  +        "cpu": "100m",
  +        "memory": "25Mi"
  +      }
  +    },
  +    {
  +      "limits": {
  +        "cpu": "100m",
  +        "memory": "25Mi"
  +      },
  +      "requests": {
  +        "cpu": "100m",
  +        "memory": "25Mi"
  +      }
  +    }
  +  ]
  }
  {
    "name": "prometheus-k8s-1",
  -  "resources": []
  +  "resources": [
  +    {
  +      "limits": {
  +        "cpu": "100m",
  +        "memory": "25Mi"
  +      },
  +      "requests": {
  +        "cpu": "100m",
  +        "memory": "25Mi"
  +      }
  +    },
  +    {
  +      "limits": {
  +        "cpu": "100m",
  +        "memory": "25Mi"
  +      },
  +      "requests": {
  +        "cpu": "100m",
  +        "memory": "25Mi"
  +      }
  +    }
  +  ]
  }
  {
  -  "name": "prometheus-operator-68f7b6bd55-hmqtj",
  +  "name": "prometheus-operator-d8745bf44-l9khn",
    "resources": [
      {
        "requests": {

With that change, our nodes no longer satisfied the assumptions that
the SchedulerPreemption tests make about the schedule load on test
nodes (i.e. less than 40% of capacity is scheduled).
openshift/origin@13b6d0e4a7 (test/e2e: scheduling: disable preemption
tests, 2019-06-04, openshift/origin#23029) disabled the test, but this
change takes the alternative temporary workaround of bumping our
node capacity to re-satisfy the existing test's assumptions.

We have sufficient capacity for doubling our xlarge consumption:

  $ export AWS_PROFILE=ci
  $ aws --region us-east-1 support describe-trusted-advisor-checks --language en --query "checks[? category == 'service_limits'].{id: @.id, name: @.name}" --output text | grep 'EC2 On-Demand Instances'
  0Xc6LMYG8P   EC2 On-Demand Instances
  $ AWS_PROFILE=ci aws --region us-east-1 support describe-trusted-advisor-check-result --check-id 0Xc6LMYG8P --query "join(\`\\n\`, result.flaggedResources[].join(\`\\t\`, [@.metadata[4] || '0', @.metadata[3], @.region || '-', '0Xc6LMYG8P', @.metadata[2]]))" --output text
  91  3000  us-east-1  0Xc6LMYG8P  On-Demand instances - m4.large
  97  3000  us-east-1  0Xc6LMYG8P  On-Demand instances - m4.xlarge
@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jun 11, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/rehearse/openshift/installer/master/e2e-aws-upi 7eb64eb link /test pj-rehearse
ci/rehearse/openshift/installer/master/e2e-aws-scaleup-rhel7 7eb64eb link /test pj-rehearse
ci/rehearse/openshift/installer/master/e2e-vsphere 7eb64eb link /test pj-rehearse
ci/prow/pj-rehearse 7eb64eb link /test pj-rehearse

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@vrutkovs
Copy link
Copy Markdown
Contributor

Ansible part looks fine
/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2019
@ravisantoshgudimetla
Copy link
Copy Markdown
Contributor

/lgtm

From scheduling side

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 11, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ravisantoshgudimetla, vrutkovs, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 544ea94 into openshift:master Jun 11, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@wking: Updated the following 10 configmaps:

  • prow-job-cluster-launch-installer-upi-e2e configmap in namespace ci using the following files:
    • key cluster-launch-installer-upi-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-upi-e2e.yaml
  • prow-job-cluster-launch-installer-upi-e2e configmap in namespace ci-stg using the following files:
    • key cluster-launch-installer-upi-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-upi-e2e.yaml
  • prow-job-cluster-launch-installer-console configmap in namespace ci-stg using the following files:
    • key cluster-launch-installer-console.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-console.yaml
  • prow-job-cluster-launch-installer-e2e configmap in namespace ci using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
  • prow-job-cluster-launch-installer-e2e configmap in namespace ci-stg using the following files:
    • key cluster-launch-installer-e2e.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-e2e.yaml
  • prow-job-cluster-scaleup-e2e-40 configmap in namespace ci using the following files:
    • key cluster-scaleup-e2e-40.yaml using file ci-operator/templates/openshift/openshift-ansible/cluster-scaleup-e2e-40.yaml
  • prow-job-cluster-scaleup-e2e-40 configmap in namespace ci-stg using the following files:
    • key cluster-scaleup-e2e-40.yaml using file ci-operator/templates/openshift/openshift-ansible/cluster-scaleup-e2e-40.yaml
  • prow-job-cluster-launch-installer-console configmap in namespace ci using the following files:
    • key cluster-launch-installer-console.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-console.yaml
  • prow-job-cluster-launch-installer-src configmap in namespace ci using the following files:
    • key cluster-launch-installer-src.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-src.yaml
  • prow-job-cluster-launch-installer-src configmap in namespace ci-stg using the following files:
    • key cluster-launch-installer-src.yaml using file ci-operator/templates/openshift/installer/cluster-launch-installer-src.yaml
Details

In response to this:

Prometheus starting making memory requests with openshift/prometheus-operator@cda68a3f (openshift/prometheus-operator#30):

$ diff -u <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/591/artifacts/e2e-aws-serial/pods.json | jq '.items[] | select(.metadata.name | contains("prometheus")) | {name: .metadata.name, resources: [.spec.containers[].resources | select((. | length) > 0)]}') <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/592/artifacts/e2e-aws-serial/pods.json | jq '.items[] | select(.metadata.name | contains("prometheus")) | {name: .metadata.name, resources: [.spec.containers[].resources | select((. | length) > 0)]}')
--- /dev/fd/63    2019-06-04 14:10:31.908436038 -0700
+++ /dev/fd/62    2019-06-04 14:10:31.908436038 -0700
@@ -1,5 +1,5 @@
{
-  "name": "prometheus-adapter-5f78cc955d-2899k",
+  "name": "prometheus-adapter-64f4f64b7-pvmhn",
 "resources": [
   {
     "requests": {
@@ -10,7 +10,7 @@
 ]
}
{
-  "name": "prometheus-adapter-5f78cc955d-2rlnx",
+  "name": "prometheus-adapter-64f4f64b7-tgnld",
 "resources": [
   {
     "requests": {
@@ -22,14 +22,56 @@
}
{
 "name": "prometheus-k8s-0",
-  "resources": []
+  "resources": [
+    {
+      "limits": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      },
+      "requests": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      }
+    },
+    {
+      "limits": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      },
+      "requests": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      }
+    }
+  ]
}
{
 "name": "prometheus-k8s-1",
-  "resources": []
+  "resources": [
+    {
+      "limits": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      },
+      "requests": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      }
+    },
+    {
+      "limits": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      },
+      "requests": {
+        "cpu": "100m",
+        "memory": "25Mi"
+      }
+    }
+  ]
}
{
-  "name": "prometheus-operator-68f7b6bd55-hmqtj",
+  "name": "prometheus-operator-d8745bf44-l9khn",
 "resources": [
   {
     "requests": {

With that change, our nodes no longer satisfied the assumptions that the SchedulerPreemption tests make about the schedule load on test nodes (i.e. less than 40% of capacity is scheduled). openshift/origin@13b6d0e4a7 (openshift/origin#23029) disabled the test, but this change takes the alternative temporary workaround of bumping our node capacity to re-satisfy the existing test's assumptions.

We have sufficient capacity for doubling our xlarge consumption:

$ export AWS_PROFILE=ci
$ aws --region us-east-1 support describe-trusted-advisor-checks --language en --query "checks[? category == 'service_limits'].{id: @.id, name: @.name}" --output text | grep 'EC2 On-Demand Instances'
0Xc6LMYG8P   EC2 On-Demand Instances
$ AWS_PROFILE=ci aws --region us-east-1 support describe-trusted-advisor-check-result --check-id 0Xc6LMYG8P --query "join(\`\\n\`, result.flaggedResources[].join(\`\\t\`, [@.metadata[4] || '0', @.metadata[3], @.region || '-', '0Xc6LMYG8P', @.metadata[2]]))" --output text
91  3000  us-east-1  0Xc6LMYG8P  On-Demand instances - m4.large
97  3000  us-east-1  0Xc6LMYG8P  On-Demand instances - m4.xlarge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the larger-workers branch August 10, 2019 04:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants