Metrics cardinality is too high

/area monitoring

When using Prometheus a standard [principle](https://www.robustperception.io/cardinality-is-key) is to have metrics with low cardinality but also as a key concept in [monitoring](https://cloud.google.com/monitoring/api/v3/metric-model#cardinality) in general. Low cardinality is a key design principle in [latest standards](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span) too.
Although Prometheus made steps to make things more [flexible](https://prometheus.io/docs/prometheus/1.8/storage/#memory-usage) in the past, when it comes to configuring mem current versions enforce you to limit the time-series ingested to tune mem implicitly, the memory consumption Knative metrics impose is not low. One way to calculate the memory is using this [calculator ](https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion).
Right now we have a lot of metrics which use as a label the revision name, config name, pod name and namespace name.
For example to mention a few:
activator (request_latencies), autoscaler (reconsiler) time series have a complexity of: #histogram_buckets*#revision*#ns
webhook emits similar histogram metrics and depends on number of kinds and namespaces.
To understand the scale if we use 30 buckets (aggregated from several histograms), 100 services and 50 namespaces this means 150K timeseries from one pod.
We have several pods and no Eventing is added in the picture where we have high cardinality due to event_type, filter_type etc.
In the calculator above 1M of time serties with specific assumption needs around 4GB of memory. Given the number of pods we use we can easily reach that number. We already face this downstream. 
Here is a sample status report for the top series on Prometheus when using 100 services:
![image](https://user-images.githubusercontent.com/7945591/115758791-33926f00-a3a8-11eb-971d-81d1d4badf33.png)

Also note that we havent taken into consideration the scenario where a pod name changes due to a restart (it can happen easily). A Prometheus instance is not meant to serve only Knative so in general we should tune our metrics api. I propose we limit our labels to the namespace level not per revision. 
Logging should be used to understand the behavior of individual services not metrics. Also we need to reconsider histograms for the webhook and controller cases, buckets make [cardinality explode](https://www.robustperception.io/how-does-a-prometheus-histogram-work).

## What version of Knative?
All versions

## Expected Behavior

Metrics should have low cardinality.

## Actual Behavior

Excessive number of time series are created.

## Steps to Reproduce the Problem

Create a moderate number of namespaces and ksvcs.

/cc @evankanderson @mattmoor @markusthoemmes 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics cardinality is too high #11248

What version of Knative?

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Metrics cardinality is too high #11248

Description

What version of Knative?

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions