Allow resources in Revision by MissingRoberto · Pull Request #2117 · knative/serving

MissingRoberto · 2018-10-01T16:24:50Z

[fixes #2099]

Proposed Changes

Allow resources defined by the user
If resources is empty, configure the user-container with the recommended values

MissingRoberto · 2018-10-01T16:35:11Z

/assign @mattmoor

MissingRoberto · 2018-10-01T16:41:21Z

Let me know if it's necessary to run ./hack/update-codegen.sh please

mattmoor · 2018-10-01T17:48:03Z

/assign @evankanderson

Would it be possible to add an e2e test that sets a memory limit and OOMs?

knative-prow-robot · 2018-10-02T10:05:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jszroberto
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: evankanderson

If they are not already assigned, you can assign the PR to them by writing /assign @evankanderson in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

MissingRoberto · 2018-10-02T11:38:19Z

/hold

evankanderson

Thanks for updating the spec and tests at the same time.

I don't know if you want to extend helloworld to add an endpoint for the memory allocation, but it would be nice to actually test that the supplied limits are enforced.

pkg/reconciler/v1alpha1/revision/resources/deploy.go

evankanderson · 2018-10-02T17:27:45Z

test/e2e/autoscale_test.go

 	}
 }
+
+func TestAutoscaleExceedsLimits(t *testing.T) {


I don't think we need to test in the autoscaler; the resource limits supplied are per-pod, so autoscaling will just make more pods.

At some point, we'll need to put in a circuit-breaker that prevents scaling up if existing pods are crash-looping. @josephburnett to track.

To be honest, my intention was to test the ScaleUpAndDown behavior if some resources would be freed or allocated. If pods would be created or destroyed. But that's probably not the responsibility of the autoscaler itself.

evankanderson · 2018-10-02T17:28:15Z

test/e2e/resources_test.go

+
+func isQuotaReached() func(d *v1beta1.Deployment) (bool, error) {
+	return func(d *v1beta1.Deployment) (bool, error) {
+		// TODO Remove this line


Is this TODONE?

evankanderson · 2018-10-02T17:32:26Z

test/e2e/resources_test.go

+	}
+}
+
+func TestQuotaExceeded(t *testing.T) {


I don't quite understand how this works. If I wanted to test this for memory, I would:

Write a small server (Go, C++ or Python) which uses a POST request parameter to allocates and fills X MB of RAM (you may need to actually touch each memory page to get Linux to allocate the bytes), then frees the memory and reports memory stats in the HTTP response.

Set a quota limit of 500MB. Send requests for 100, 200, 800, and make sure that the first two succeed, and the third fails.

Here is an example of consuming memory: https://github.com/knative/docs/blob/6d21cd89cbc8d5ab1c0d11f53343ed494d0980dc/serving/samples/autoscale-go/autoscale.go#L76

It makes sense

MissingRoberto · 2018-10-04T13:08:05Z

@evankanderson thank you for the review of WIP.

I am pushing wip commits because I am not able to get e2e tests to run on minikube.

Let's do NOT merge this yet, because I found something unexpected.

When I set up LimitRanges or QuotaLimits, running the services fail with the following error (even if I specify defaults): failed quota: quota-lite: must specify limits.memory,requests.memory.

kubectl doesn't this problem.

MissingRoberto · 2018-10-08T12:36:37Z

/test pull-knative-serving-integration-tests

evankanderson · 2018-10-09T00:19:29Z

Is this working now, or still WIP? (Ping me with a @ mention when you want another review.)

MissingRoberto · 2018-10-09T08:56:11Z

/test pull-knative-serving-integration-tests

MissingRoberto · 2018-10-11T09:22:49Z

/test pull-knative-serving-integration-tests

MissingRoberto · 2018-10-11T12:05:10Z

/test pull-knative-serving-integration-tests

MissingRoberto · 2018-10-11T14:11:45Z

/test pull-knative-serving-integration-tests

MissingRoberto · 2018-10-16T11:20:48Z

/unhold

@evankanderson Finally I got it working. Can you review it, please?

MissingRoberto · 2018-10-16T11:22:47Z

/hold cancel

greghaynes · 2018-10-16T15:56:00Z

pkg/apis/serving/v1alpha1/revision_validation_test.go

 			Lifecycle: &corev1.Lifecycle{},
 		},
-		want: apis.ErrDisallowedFields("name", "resources", "ports", "volumeMounts", "lifecycle"),
+		want: apis.ErrDisallowedFields("name", "ports", "volumeMounts", "lifecycle"),


nit: we can remove resources from the resource definition too in this case

greghaynes · 2018-10-16T16:01:49Z

pkg/reconciler/v1alpha1/revision/resources/deploy.go

-	userContainer.Resources = userResources
+
+	if equality.Semantic.DeepEqual(userContainer.Resources, corev1.ResourceRequirements{}) {
+		userContainer.Resources = userResources


I wonder if we should do a deep merge on CPU here? IIUC if I specify only a memory resource here I will get an implicit undefinition of CPU resources.

That makes lot of sense. I thought about it as well.

MissingRoberto · 2018-10-17T10:29:11Z

/test pull-knative-serving-integration-tests

MissingRoberto · 2018-10-18T06:48:33Z

@evankanderson @greghaynes can you review again, please?

mattmoor · 2018-10-22T20:28:36Z

@jszroberto Can you add unit test coverage of resource limits for applyDefaultResources? Why do you have to manually copy these?

mattmoor · 2018-10-22T20:28:45Z

Also, needs a rebase.

mattmoor · 2018-10-31T19:14:27Z

@jszroberto Any chance for a rebase, so we can close this one out?

[knative/serving##2099]

knative-metrics-robot · 2018-11-02T12:37:13Z

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
test/test_images/bloatingcow/bloatingcow.go	Do not exist	0.0%

mattmoor · 2018-11-02T14:04:23Z

@jszroberto Looks like it's the new test that's failing. Generally when adding new e2e, we should try to run them 10x or so to make sure we're not introducing new flakes. You should be able to do this with -count=10 -run=TestCustomResourcesLimits on your go test command.

mattmoor · 2018-11-06T16:49:02Z

I ran this 10x locally without fail, so...

/test pull-knative-serving-integration-tests

knative-prow-robot · 2018-11-06T17:17:57Z

@jszroberto: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-knative-serving-integration-tests	`17da6c1`	link	`/test pull-knative-serving-integration-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

mattmoor · 2018-11-06T22:54:33Z

@adrcunha it's not clear to me why this is failing in Prow, but not for me locally (I run it 10x).

bbrowning · 2018-11-09T12:22:02Z

@mattmoor If a Container allocates more memory than its limit, the Container becomes a candidate for termination. If the Container continues to consume memory beyond its limit, the Container is terminated. - https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#exceed-a-container-s-memory-limit

The test assumes the container is immediately terminated as soon as it exceeds the memory limit. But, in reality, it takes a bit of time for Kubernetes to OOMKill the pod.

mattmoor · 2018-11-09T14:46:07Z

@bbrowning Interesting that this passed 10x for me then... I guess I got lucky. Let me look at the way we check this to see if I can suggest anything.

mattmoor · 2018-11-09T14:47:35Z

test/test_images/bloatingcow/bloatingcow.go

+
+	b := make([]byte, mb*1024*1024)
+	b[0] = 1
+	b[len(b)-1] = 1


Perhaps initialize the whole thing for good measure?

I'm worried about the allocation happening virtually and then getting filled in through page faults as the pages are actually consumed. I doubt first and last is good enough for all sizes.

mattmoor · 2018-11-14T14:39:12Z

@jszroberto Curious if you've had a time to explore this at all?

MissingRoberto · 2018-11-15T15:49:37Z

@mattmoor no, I didn't. I am still out of office.

mattmoor · 2018-11-28T02:00:13Z

@jszroberto Any chance you are back? I'd love to get this in.

mattmoor · 2018-11-30T04:27:30Z

I have a version of this in the linked PR that has passed at least one round of integration testing. I'll run it a few more times, but if it is consistent, I'll move ahead unless @jszroberto comes back and wants to incorporate the extra changes in my PR.

ellistarn · 2018-11-30T04:35:53Z

Awesome! Matt, this still won't enable specific gpu resources, correct? Node selector is gkes hook for this, but my understanding is that it won't pass through a Knative spec.

…

On Thu, Nov 29, 2018, 8:27 PM Matt Moore ***@***.*** wrote: I have a version of this in the linked PR that has passed at least one round of integration testing. I'll run it a few more times, but if it is consistent, I'll move ahead unless @jszroberto <https://github.com/jszroberto> comes back and wants to incorporate the extra changes in my PR. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2117 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACd1iAhlCC1JCX2rtgNaSgGgr5V8R1PTks5u0LO3gaJpZM4XCa_7> .

Originally based on #2117 Fixes: #2099

mattmoor · 2018-11-30T14:38:45Z

I just landed my PR based on this. thanks @jszroberto for implementing this!

knative-prow-robot requested review from jonjohnsonjr and tcnghia October 1, 2018 16:24

knative-prow-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 1, 2018

knative-prow-robot assigned mattmoor Oct 1, 2018

knative-prow-robot assigned evankanderson Oct 1, 2018

knative-prow-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 2, 2018

knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2018

evankanderson reviewed Oct 2, 2018

View reviewed changes

josephburnett mentioned this pull request Oct 3, 2018

Stop scaling up if pods are crashlooping #2145

Closed

MissingRoberto changed the title ~~Allow resources in Revision~~ WIP: Allow resources in Revision Oct 4, 2018

knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 4, 2018

MissingRoberto changed the title ~~WIP: Allow resources in Revision~~ Allow resources in Revision Oct 16, 2018

knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 16, 2018

knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 16, 2018

greghaynes reviewed Oct 16, 2018

View reviewed changes

Allow resources in Revision

17da6c1

[knative/serving##2099]

mattmoor reviewed Nov 9, 2018

View reviewed changes

mattmoor mentioned this pull request Nov 20, 2018

Clients are not able to know the values of unspecified attributes #2521

Closed

mattmoor modified the milestone: Serving 0.3 Nov 20, 2018

evankanderson mentioned this pull request Nov 26, 2018

Cannot set "resources" on a container revision spec #2099

Closed

mattmoor mentioned this pull request Nov 30, 2018

Try to fixup #2117 #2586

Merged

mattmoor added a commit that referenced this pull request Nov 30, 2018

Allow resources in Revision (#2586)

7b07d51

Originally based on #2117 Fixes: #2099

mattmoor closed this Nov 30, 2018

Conversation

MissingRoberto commented Oct 1, 2018

Proposed Changes

Uh oh!

MissingRoberto commented Oct 1, 2018

Uh oh!

MissingRoberto commented Oct 1, 2018

Uh oh!

mattmoor commented Oct 1, 2018

Uh oh!

knative-prow-robot commented Oct 2, 2018

Uh oh!

MissingRoberto commented Oct 2, 2018

Uh oh!

evankanderson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MissingRoberto commented Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MissingRoberto commented Oct 8, 2018

Uh oh!

evankanderson commented Oct 9, 2018

Uh oh!

MissingRoberto commented Oct 9, 2018

Uh oh!

MissingRoberto commented Oct 11, 2018

Uh oh!

MissingRoberto commented Oct 11, 2018

Uh oh!

MissingRoberto commented Oct 11, 2018

Uh oh!

MissingRoberto commented Oct 16, 2018

Uh oh!

MissingRoberto commented Oct 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MissingRoberto commented Oct 17, 2018

Uh oh!

MissingRoberto commented Oct 18, 2018

Uh oh!

mattmoor commented Oct 22, 2018

Uh oh!

mattmoor commented Oct 22, 2018

Uh oh!

mattmoor commented Oct 31, 2018

Uh oh!

knative-metrics-robot commented Nov 2, 2018

Uh oh!

mattmoor commented Nov 2, 2018

Uh oh!

mattmoor commented Nov 6, 2018

Uh oh!

knative-prow-robot commented Nov 6, 2018

Uh oh!

mattmoor commented Nov 6, 2018

Uh oh!

bbrowning commented Nov 9, 2018

MissingRoberto commented Oct 4, 2018 •

edited

Loading