Skip to content

Autoscaler memory usage increases even with total number of ksvcs/revisions kept constant #10516

@skonto

Description

@skonto

/area monitoring

What version of Knative?

0.18.2 but should affect any version greater than this one.

Expected Behavior

Creating/deleting a number of ksvcs while keeping the final number constant eg zero should not create any significant difference
in the heap memory allocated before and after the crud ops.

Actual Behavior

Running reproducer reproducer.sh (see attached file reproducer.sh.txt) soak test that repeatedly creates and deletes ksvcs, keeping the total number of ksvcs constant autoscaler gets OOMKilled after 4 hours.

autoscaler_oomkilled

Steps to Reproduce the Problem

Running reproducer.sh (see attached file) for > 4hs.

[root@ocp-dynamic-6653 ~]# go tool pprof --base "knative-serving.autoscaler-699dff8cff-hk27b.heap.2021-01-10_09:24:04-05:00.pb.gz" "knative-serving.autoscaler-699dff8cff-hk27b.heap.2021-01-10_10:47:41-05:00.pb.gz"                        
File: autoscaler
Type: inuse_space
Time: Jan 10, 2021 at 9:24am (EST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 47.90MB, 85.65% of 55.93MB total
Dropped 5 nodes (cum <= 0.28MB)
Showing top 10 nodes out of 130
      flat  flat%   sum%        cum   cum%
   26.89MB 48.08% 48.08%    26.89MB 48.08%  go.opencensus.io/stats/view.NewMeter
       6MB 10.73% 58.81%        6MB 10.73%  go.opencensus.io/stats/view.(*collector).addSample
    3.50MB  6.26% 65.07%        9MB 16.10%  go.opencensus.io/stats/view.(*worker).tryRegisterView
       3MB  5.36% 70.44%        3MB  5.36%  go.opencensus.io/stats/view.viewToMetricDescriptor
    2.50MB  4.47% 74.91%     2.50MB  4.47%  knative.dev/pkg/metrics.copyViews
    1.51MB  2.69% 77.60%     1.51MB  2.69%  k8s.io/apimachinery/pkg/apis/meta/v1.(*FieldsV1).UnmarshalJSON
    1.50MB  2.68% 80.28%     1.50MB  2.68%  go.opencensus.io/stats/view.(*worker).getMeasureRef (inline)
       1MB  1.79% 82.07%        1MB  1.79%  go.opencensus.io/stats/view.(*worker).RegisterExporter
       1MB  1.79% 83.86%        2MB  3.58%  go.opencensus.io/stats/view.(*collector).collectedRows
       1MB  1.79% 85.65%        4MB  7.15%  go.opencensus.io/stats/view.newViewInternal (inline)

autoscaler.heap.zip contains the data for heap comparison.
This seems to happen due to the fact that a NewMeter is called every time we record a metric and we actually never delete any meter from this map. It seems like a bug at the knative.pkg/metrics side.
Note: that this affects other components since we use metrics everywhere. We noticed also an issue with the activator by using a similar test that issues http requests.

Credit goes to @maschmid for discovering this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions