/area monitoring
What version of Knative?
0.18.2 but should affect any version greater than this one.
Expected Behavior
Creating/deleting a number of ksvcs while keeping the final number constant eg zero should not create any significant difference
in the heap memory allocated before and after the crud ops.
Actual Behavior
Running reproducer reproducer.sh (see attached file reproducer.sh.txt) soak test that repeatedly creates and deletes ksvcs, keeping the total number of ksvcs constant autoscaler gets OOMKilled after 4 hours.

Steps to Reproduce the Problem
Running reproducer.sh (see attached file) for > 4hs.
[root@ocp-dynamic-6653 ~]# go tool pprof --base "knative-serving.autoscaler-699dff8cff-hk27b.heap.2021-01-10_09:24:04-05:00.pb.gz" "knative-serving.autoscaler-699dff8cff-hk27b.heap.2021-01-10_10:47:41-05:00.pb.gz"
File: autoscaler
Type: inuse_space
Time: Jan 10, 2021 at 9:24am (EST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 47.90MB, 85.65% of 55.93MB total
Dropped 5 nodes (cum <= 0.28MB)
Showing top 10 nodes out of 130
flat flat% sum% cum cum%
26.89MB 48.08% 48.08% 26.89MB 48.08% go.opencensus.io/stats/view.NewMeter
6MB 10.73% 58.81% 6MB 10.73% go.opencensus.io/stats/view.(*collector).addSample
3.50MB 6.26% 65.07% 9MB 16.10% go.opencensus.io/stats/view.(*worker).tryRegisterView
3MB 5.36% 70.44% 3MB 5.36% go.opencensus.io/stats/view.viewToMetricDescriptor
2.50MB 4.47% 74.91% 2.50MB 4.47% knative.dev/pkg/metrics.copyViews
1.51MB 2.69% 77.60% 1.51MB 2.69% k8s.io/apimachinery/pkg/apis/meta/v1.(*FieldsV1).UnmarshalJSON
1.50MB 2.68% 80.28% 1.50MB 2.68% go.opencensus.io/stats/view.(*worker).getMeasureRef (inline)
1MB 1.79% 82.07% 1MB 1.79% go.opencensus.io/stats/view.(*worker).RegisterExporter
1MB 1.79% 83.86% 2MB 3.58% go.opencensus.io/stats/view.(*collector).collectedRows
1MB 1.79% 85.65% 4MB 7.15% go.opencensus.io/stats/view.newViewInternal (inline)
autoscaler.heap.zip contains the data for heap comparison.
This seems to happen due to the fact that a NewMeter is called every time we record a metric and we actually never delete any meter from this map. It seems like a bug at the knative.pkg/metrics side.
Note: that this affects other components since we use metrics everywhere. We noticed also an issue with the activator by using a similar test that issues http requests.
Credit goes to @maschmid for discovering this.
/area monitoring
What version of Knative?
0.18.2 but should affect any version greater than this one.
Expected Behavior
Creating/deleting a number of ksvcs while keeping the final number constant eg zero should not create any significant difference
in the heap memory allocated before and after the crud ops.
Actual Behavior
Running reproducer reproducer.sh (see attached file reproducer.sh.txt) soak test that repeatedly creates and deletes ksvcs, keeping the total number of ksvcs constant autoscaler gets OOMKilled after 4 hours.
Steps to Reproduce the Problem
Running
reproducer.sh(see attached file) for > 4hs.autoscaler.heap.zip contains the data for heap comparison.
This seems to happen due to the fact that a
NewMeteris called every time we record a metric and we actually never delete any meter from this map. It seems like a bug at the knative.pkg/metrics side.Note: that this affects other components since we use metrics everywhere. We noticed also an issue with the activator by using a similar test that issues http requests.
Credit goes to @maschmid for discovering this.