Logging improvements#1620
Logging improvements#1620google-prow-robot merged 7 commits intoknative:masterfrom mdemirhan:loggingimprovements
Conversation
mdemirhan
commented
Jul 18, 2018
- Change log levels dynamically for activator, webhook and autoscaler
- Remove the use of runtime.Handle and instead use logger.Errorf since that is pretty much all runtime.Handle does today.
- Disable glog in activator and autoscaler to prevent k8s logs appearing in these services. We already log errors and extra & unstructured k8s logs are not helpful.
- Simplify the monitoring installation by getting rid of dev profile. Instead, we now by default watch for logs in knative components, but the logging level is set to fatal. To enable Knative logging, we just need to edit config-logging configmap and change those values to info. The updates will be picked up immediately without the need to restart any components.
* Remove the use of runtime.Handle and instead use logger.Errorf since that is pretty much all runtime.Handle does today. * Disable glog in activator and autoscaler to prevent k8s logs appearing in these services. We already log errors and extra & unstructured k8s logs are not helpful. * Simplify the monitoring installation by getting rid of dev profile. Instead, we now by default watch for logs in knative components, but the logging level is set to fatal. To enable Knative logging, we just need to edit config-logging configmap and change those values to info. The updates will be picked up immediately without the need to restart any components.
|
Holding it until the branch is open for checkins again. |
| ```shell | ||
| ko apply -f config/ | ||
|
|
||
| # Set the logging threhold to info for all Knative components and reapply the config |
| @id fluentd-containers.log | ||
| @type tail | ||
| path /var/log/containers/*user-container-*.log,/var/log/containers/*build-step-*.log,/var/log/containers/*controller-*.log,/var/log/containers/*webhook-*.log,/var/log/containers/*autoscaler-*.log,/var/log/containers/*queue-proxy-*.log,/var/log/containers/*activator-*.log | ||
| path /var/log/containers/*user-container-*.log,/var/log/containers/*build-step-*.log,/var/log/containers/controller-*controller-*.log,/var/log/containers/webhook-*webhook-*.log,/var/log/containers/*autoscaler-*autoscaler-*.log,/var/log/containers/*queue-proxy-*.log,/var/log/containers/activator-*activator-*.log |
There was a problem hiding this comment.
Extra controller-, webhook-, autoscaler-, activator-? The file names don't look right to me.
There was a problem hiding this comment.
The changes are intentional. With the changes, we will collect logs from only specific pods - for example, we won't collect logs from all containers named controller, but only containers in a pod that starts with controller.
There was a problem hiding this comment.
Could you also do this change to the configmap for stackdriver? They are not consistent currently.
|
|
||
| 1. First, go through [OpenCensus Go Documentation](https://godoc.org/go.opencensus.io). | ||
| 2. Add the following to your application startup: | ||
| 1.First, go through [OpenCensus Go Documentation](https://godoc.org/go.opencensus.io). |
There was a problem hiding this comment.
Will markdown regard this as ordered list without space after the number?
There was a problem hiding this comment.
Good catch. Fixed it.
|
/hold cancel |
mattmoor
left a comment
There was a problem hiding this comment.
I like the reduction in code / config :)
| mux.HandleFunc("/", handler) | ||
| mux.Handle("/metrics", exporter) | ||
| http.ListenAndServe(":"+servingAutoscalerPort, mux) | ||
| close(stopCh) |
There was a problem hiding this comment.
use defer close(stopCh) right after the make?
| loglevel.autoscaler: "fatal" | ||
| loglevel.queueproxy: "fatal" | ||
| loglevel.webhook: "fatal" | ||
| loglevel.activator: "fatal" |
There was a problem hiding this comment.
This seems like a high default, are we sure we don't want error to start?
There was a problem hiding this comment.
I think before we change this default (at all) we should make it really clear this is changing in slack, or folks are going to get confused
There was a problem hiding this comment.
Will change this back to info level.
# Conflicts: # docs/telemetry.md # hack/release.sh
|
/hold |
|
The following is the coverage report on pkg/.
|
|
/retest |
|
/hold cancel |
|
@mattmoor this is ready for another look. I changed the default level to info. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mattmoor, mdemirhan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
What are the possible log levels? |