Skip to content

Difficulty Parsing Logs Consistently #835

@jtslear

Description

@jtslear

While troubleshooting various services that are a part of Cortex, one item I've found is that we are unable to easily parse are the logs. It would seem various parts of the services log in multiple ways which causes a bit of trouble as far as being able to create a reasonable logging filter that is parsable by X method of ingesting logs.

As an example, I targeted the distributor and found at least 3 ways this service put logs down (sanitized):

ts=2018-05-31T18:19:27.515467232Z caller=log.go:112 level=error org_id=0 msg="push error" err="rpc error: code = Code(400) desc = sample with repeated timestamp but different value for series container_fs_reads_total{container_name=\"example\", example_cluster=\"Example\", a_cluster=\"Example\", id=\"/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod526740fc_3e0e_11e8_b916_02000a838eab.slice/docker-b83abfed1d197426a06778e7dcd6c0470ac60e77d5f76173793fd0025929d039.scope\", image=\"docker.io/example/example@sha256:261537e58647e604701c97765973e2c4d0627953f382883f83d2d64144803cdc\", instance=\"10.11.143.157:8443\", job=\"kubernetes-apiservers\", name=\"k8s_example_example-1-lqqtq_exampleproject_526740fc-3e0e-11e8-b916-02000a838eab_0\", namespace=\"example\", pod_name=\"example-1-lqqtq\"}; last value: 1089, incoming value: 1"
ts=2018-05-30T21:12:56.673829487Z caller=log.go:112 level=error msg="error getting path" key=collectors/ring err="Unexpected response code: 500"
time="2018-05-30T19:58:18Z" level=warning msg="Is websocket request: false\nPOST /api/prom/push HTTP/1.1\r\nHost: distributor\r\nConnection: close\r\nAccept-Encoding: gzip\r\nConnection: close\r\nContent-Encoding: snappy\r\nContent-Length: 36513\r\nContent-Type: application/x-protobuf\r\nUser-Agent: Go-http-client/1.1\r\nX-Prometheus-Remote-Write-Version: 0.1.0\r\nX-Scope-Orgid: 0\r\n\r\n"

In the first two examples, the various fields are inserted when needed (this is probably a noop). My main concern is comparing one of the first two examples with the last example. It appears there's an entirely different logging mechanism all together.

This makes it a bit difficult for various systems to parse the logs thoughtfully ensuring a consistent experience when viewing the logs for troubleshooting issues.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions