Skip to content

Logging improvements#1620

Merged
google-prow-robot merged 7 commits intoknative:masterfrom
mdemirhan:loggingimprovements
Jul 23, 2018
Merged

Logging improvements#1620
google-prow-robot merged 7 commits intoknative:masterfrom
mdemirhan:loggingimprovements

Conversation

@mdemirhan
Copy link
Copy Markdown
Contributor

  • Change log levels dynamically for activator, webhook and autoscaler
  • Remove the use of runtime.Handle and instead use logger.Errorf since that is pretty much all runtime.Handle does today.
  • Disable glog in activator and autoscaler to prevent k8s logs appearing in these services. We already log errors and extra & unstructured k8s logs are not helpful.
  • Simplify the monitoring installation by getting rid of dev profile. Instead, we now by default watch for logs in knative components, but the logging level is set to fatal. To enable Knative logging, we just need to edit config-logging configmap and change those values to info. The updates will be picked up immediately without the need to restart any components.

* Remove the use of runtime.Handle and instead use logger.Errorf since that is pretty much all runtime.Handle does today.
* Disable glog in activator and autoscaler to prevent k8s logs appearing in these services. We already log errors and extra & unstructured k8s logs are not helpful.
* Simplify the monitoring installation by getting rid of dev profile. Instead, we now by default watch for logs in knative components, but the logging level is set to fatal. To enable Knative logging, we just need to edit config-logging configmap and change those values to info. The updates will be picked up immediately without the need to restart any components.
@mdemirhan mdemirhan requested review from mattmoor and yanweiguo July 18, 2018 19:59
@google-prow-robot google-prow-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 18, 2018
@mdemirhan mdemirhan added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 18, 2018
@mdemirhan
Copy link
Copy Markdown
Contributor Author

Holding it until the branch is open for checkins again.

Comment thread DEVELOPMENT.md Outdated
```shell
ko apply -f config/

# Set the logging threhold to info for all Knative components and reapply the config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: threshold

@id fluentd-containers.log
@type tail
path /var/log/containers/*user-container-*.log,/var/log/containers/*build-step-*.log,/var/log/containers/*controller-*.log,/var/log/containers/*webhook-*.log,/var/log/containers/*autoscaler-*.log,/var/log/containers/*queue-proxy-*.log,/var/log/containers/*activator-*.log
path /var/log/containers/*user-container-*.log,/var/log/containers/*build-step-*.log,/var/log/containers/controller-*controller-*.log,/var/log/containers/webhook-*webhook-*.log,/var/log/containers/*autoscaler-*autoscaler-*.log,/var/log/containers/*queue-proxy-*.log,/var/log/containers/activator-*activator-*.log
Copy link
Copy Markdown
Contributor

@yanweiguo yanweiguo Jul 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra controller-, webhook-, autoscaler-, activator-? The file names don't look right to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes are intentional. With the changes, we will collect logs from only specific pods - for example, we won't collect logs from all containers named controller, but only containers in a pod that starts with controller.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also do this change to the configmap for stackdriver? They are not consistent currently.

Comment thread docs/telemetry.md Outdated

1. First, go through [OpenCensus Go Documentation](https://godoc.org/go.opencensus.io).
2. Add the following to your application startup:
1.First, go through [OpenCensus Go Documentation](https://godoc.org/go.opencensus.io).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will markdown regard this as ordered list without space after the number?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed it.

@mdemirhan
Copy link
Copy Markdown
Contributor Author

/hold cancel

@google-prow-robot google-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 20, 2018
Copy link
Copy Markdown
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the reduction in code / config :)

Comment thread cmd/autoscaler/main.go Outdated
mux.HandleFunc("/", handler)
mux.Handle("/metrics", exporter)
http.ListenAndServe(":"+servingAutoscalerPort, mux)
close(stopCh)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use defer close(stopCh) right after the make?

Comment thread config/config-logging.yaml Outdated
loglevel.autoscaler: "fatal"
loglevel.queueproxy: "fatal"
loglevel.webhook: "fatal"
loglevel.activator: "fatal"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a high default, are we sure we don't want error to start?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think before we change this default (at all) we should make it really clear this is changing in slack, or folks are going to get confused

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change this back to info level.

Mustafa Demirhan added 2 commits July 23, 2018 12:51
@mdemirhan
Copy link
Copy Markdown
Contributor Author

/hold
Holding to test the latest merge.

@google-prow-robot google-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2018
@knative-metrics-robot
Copy link
Copy Markdown

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/controller/route/route.go 96.6% 96.5% -0.0
pkg/logging/config.go 95.6% 96.3% 0.7

@mdemirhan
Copy link
Copy Markdown
Contributor Author

/retest

@mdemirhan
Copy link
Copy Markdown
Contributor Author

/hold cancel

@google-prow-robot google-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2018
@mdemirhan
Copy link
Copy Markdown
Contributor Author

@mattmoor this is ready for another look. I changed the default level to info.

Copy link
Copy Markdown
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@google-prow-robot google-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 23, 2018
@google-prow-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor, mdemirhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-prow-robot google-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 23, 2018
@google-prow-robot google-prow-robot merged commit 5441a18 into knative:master Jul 23, 2018
@mdemirhan mdemirhan deleted the loggingimprovements branch July 23, 2018 22:43
@AceHack
Copy link
Copy Markdown
Contributor

AceHack commented Sep 22, 2020

What are the possible log levels?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants