test/e2e: scheduling: disable preemption tests by sjenning · Pull Request #23029 · openshift/origin

sjenning · 2019-06-04T21:22:45Z

After Prometheus starting making reasonable memory requests, the assumptions that the SchedulerPreemption tests make about the scheduled load on test nodes do not hold (i.e. less than 40% of capacity is scheduled).

Example e2e failure
https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/598

Jun  4 15:35:22.846: INFO: At 2019-06-04 15:35:10 +0000 UTC - event for pod0-sched-preemption-low-priority: {default-scheduler } Scheduled: Successfully assigned sched-preemption-2604/pod0-sched-preemption-low-priority to ip-10-0-138-81.ec2.internal
Jun  4 15:35:22.846: INFO: At 2019-06-04 15:35:10 +0000 UTC - event for pod0-sched-preemption-low-priority: {default-scheduler } Preempted: by sched-preemption-2604/pod1-sched-preemption-medium-priority on node ip-10-0-138-81.ec2.internal
Jun  4 15:35:22.846: INFO: At 2019-06-04 15:35:10 +0000 UTC - event for pod1-sched-preemption-medium-priority: {default-scheduler } FailedScheduling: 0/6 nodes are available: 1 Insufficient memory, 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate.

Flaking since openshift/prometheus-operator#30 which allowed the resource requests for the prometheus statefulset to flow through
https://testgrid.k8s.io/redhat-openshift-release-blocking#redhat-release-openshift-origin-installer-e2e-aws-serial-4.2&sort-by-flakiness=

BZ to track reenablement
https://bugzilla.redhat.com/show_bug.cgi?id=1717198

@smarterclayton @wking @ravisantoshgudimetla

wking · 2019-06-04T21:24:06Z

/lgtm

openshift-ci-robot · 2019-06-04T21:24:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sjenning, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/extended/OWNERS~~ [sjenning]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2019-06-04T21:55:20Z

/retest

Please review the full test history for this PR and help us cut down flakes.

wking · 2019-06-05T05:39:47Z

/retest

Now that openshift/cluster-kube-apiserver-operator#495 has landed.

Prometheus starting making memory requests with openshift/prometheus-operator@cda68a3f (Merge pull request openshift/prometheus-operator#30 from paulfantom/merge-release-0.30.1, 2019-06-04): $ diff -u <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/591/artifacts/e2e-aws-serial/pods.json | jq '.items[] | select(.metadata.name | contains("prometheus")) | {name: .metadata.name, resources: [.spec.containers[].resources | select((. | length) > 0)]}') <(curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-serial-4.2/592/artifacts/e2e-aws-serial/pods.json | jq '.items[] | select(.metadata.name | contains("prometheus")) | {name: .metadata.name, resources: [.spec.containers[].resources | select((. | length) > 0)]}') --- /dev/fd/63 2019-06-04 14:10:31.908436038 -0700 +++ /dev/fd/62 2019-06-04 14:10:31.908436038 -0700 @@ -1,5 +1,5 @@ { - "name": "prometheus-adapter-5f78cc955d-2899k", + "name": "prometheus-adapter-64f4f64b7-pvmhn", "resources": [ { "requests": { @@ -10,7 +10,7 @@ ] } { - "name": "prometheus-adapter-5f78cc955d-2rlnx", + "name": "prometheus-adapter-64f4f64b7-tgnld", "resources": [ { "requests": { @@ -22,14 +22,56 @@ } { "name": "prometheus-k8s-0", - "resources": [] + "resources": [ + { + "limits": { + "cpu": "100m", + "memory": "25Mi" + }, + "requests": { + "cpu": "100m", + "memory": "25Mi" + } + }, + { + "limits": { + "cpu": "100m", + "memory": "25Mi" + }, + "requests": { + "cpu": "100m", + "memory": "25Mi" + } + } + ] } { "name": "prometheus-k8s-1", - "resources": [] + "resources": [ + { + "limits": { + "cpu": "100m", + "memory": "25Mi" + }, + "requests": { + "cpu": "100m", + "memory": "25Mi" + } + }, + { + "limits": { + "cpu": "100m", + "memory": "25Mi" + }, + "requests": { + "cpu": "100m", + "memory": "25Mi" + } + } + ] } { - "name": "prometheus-operator-68f7b6bd55-hmqtj", + "name": "prometheus-operator-d8745bf44-l9khn", "resources": [ { "requests": { With that change, our nodes no longer satisfied the assumptions that the SchedulerPreemption tests make about the schedule load on test nodes (i.e. less than 40% of capacity is scheduled). openshift/origin@13b6d0e4a7 (test/e2e: scheduling: disable preemption tests, 2019-06-04, openshift/origin#23029) disabled the test, but this change takes the alternative temporary workaround of bumping our node capacity to re-satisfy the existing test's assumptions. We have sufficient capacity for doubling our xlarge consumption: $ export AWS_PROFILE=ci $ aws --region us-east-1 support describe-trusted-advisor-checks --language en --query "checks[? category == 'service_limits'].{id: @.id, name: @.name}" --output text | grep 'EC2 On-Demand Instances' 0Xc6LMYG8P EC2 On-Demand Instances $ AWS_PROFILE=ci aws --region us-east-1 support describe-trusted-advisor-check-result --check-id 0Xc6LMYG8P --query "join(\`\\n\`, result.flaggedResources[].join(\`\\t\`, [@.metadata[4] || '0', @.metadata[3], @.region || '-', '0Xc6LMYG8P', @.metadata[2]]))" --output text 91 3000 us-east-1 0Xc6LMYG8P On-Demand instances - m4.large 97 3000 us-east-1 0Xc6LMYG8P On-Demand instances - m4.xlarge

test/e2e: scheduling: disable preemption tests

13b6d0e

openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 4, 2019

openshift-ci-robot requested review from mfojtik and smarterclayton June 4, 2019 21:23

openshift-ci-robot assigned wking Jun 4, 2019

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2019

openshift-merge-robot merged commit 4d45bb3 into openshift:master Jun 5, 2019

This was referenced Jun 11, 2019

ci-operator/templates/openshift: m4.xlarge compute nodes openshift/release#4027

Merged

UPSTREAM: 0000: Re-enable preemption tests #23128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/e2e: scheduling: disable preemption tests#23029

test/e2e: scheduling: disable preemption tests#23029
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
sjenning:disable-sched-preemption-serial-test

sjenning commented Jun 4, 2019 •

edited

Loading

Uh oh!

wking commented Jun 4, 2019

Uh oh!

openshift-ci-robot commented Jun 4, 2019

Uh oh!

openshift-bot commented Jun 4, 2019

Uh oh!

wking commented Jun 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sjenning commented Jun 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wking commented Jun 4, 2019

Uh oh!

openshift-ci-robot commented Jun 4, 2019

Uh oh!

openshift-bot commented Jun 4, 2019

Uh oh!

wking commented Jun 5, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sjenning commented Jun 4, 2019 •

edited

Loading