-
Notifications
You must be signed in to change notification settings - Fork 849
Description
While troubleshooting various services that are a part of Cortex, one item I've found is that we are unable to easily parse are the logs. It would seem various parts of the services log in multiple ways which causes a bit of trouble as far as being able to create a reasonable logging filter that is parsable by X method of ingesting logs.
As an example, I targeted the distributor and found at least 3 ways this service put logs down (sanitized):
ts=2018-05-31T18:19:27.515467232Z caller=log.go:112 level=error org_id=0 msg="push error" err="rpc error: code = Code(400) desc = sample with repeated timestamp but different value for series container_fs_reads_total{container_name=\"example\", example_cluster=\"Example\", a_cluster=\"Example\", id=\"/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod526740fc_3e0e_11e8_b916_02000a838eab.slice/docker-b83abfed1d197426a06778e7dcd6c0470ac60e77d5f76173793fd0025929d039.scope\", image=\"docker.io/example/example@sha256:261537e58647e604701c97765973e2c4d0627953f382883f83d2d64144803cdc\", instance=\"10.11.143.157:8443\", job=\"kubernetes-apiservers\", name=\"k8s_example_example-1-lqqtq_exampleproject_526740fc-3e0e-11e8-b916-02000a838eab_0\", namespace=\"example\", pod_name=\"example-1-lqqtq\"}; last value: 1089, incoming value: 1"
ts=2018-05-30T21:12:56.673829487Z caller=log.go:112 level=error msg="error getting path" key=collectors/ring err="Unexpected response code: 500"
time="2018-05-30T19:58:18Z" level=warning msg="Is websocket request: false\nPOST /api/prom/push HTTP/1.1\r\nHost: distributor\r\nConnection: close\r\nAccept-Encoding: gzip\r\nConnection: close\r\nContent-Encoding: snappy\r\nContent-Length: 36513\r\nContent-Type: application/x-protobuf\r\nUser-Agent: Go-http-client/1.1\r\nX-Prometheus-Remote-Write-Version: 0.1.0\r\nX-Scope-Orgid: 0\r\n\r\n"
In the first two examples, the various fields are inserted when needed (this is probably a noop). My main concern is comparing one of the first two examples with the last example. It appears there's an entirely different logging mechanism all together.
This makes it a bit difficult for various systems to parse the logs thoughtfully ensuring a consistent experience when viewing the logs for troubleshooting issues.