Skip to content

Comments

Expose log messages as Prometheus metrics.#572

Merged
marccarre merged 1 commit intomasterfrom
log-msg-as-prom-metric
Oct 5, 2017
Merged

Expose log messages as Prometheus metrics.#572
marccarre merged 1 commit intomasterfrom
log-msg-as-prom-metric

Conversation

@marccarre
Copy link
Contributor

Steps followed:

  • add promrus to be able to expose metrics on log messages ("legacy" version used so that we keep using Sirupsen/logrus instead of sirupsen/logrus for now, as it leads to too many conflicts otherwise):
dep ensure github.com/weaveworks/promrus@v1.0.0-legacy && dep ensure && dep prune
git status | grep BUILD.bazel | cut -d' ' -f 5 | xargs git checkout HEAD
  • add ability to add hook to prometheus/common/log, for now directly in vendor, change will eventually have to be made upstream, and Cortex upgraded,
  • add hook in all main.go.

Manual testing in local:

$ kubectl get po -n cortex -o wide
NAME                            READY     STATUS    RESTARTS   AGE       IP            NODE
alertmanager-1820770292-bjv09   1/1       Running   0          20s       172.17.0.33   minikube
configs-3938330905-h2jz7        1/1       Running   0          20s       172.17.0.34   minikube
lite-601389945-8rvf3            1/1       Running   0          20s       172.17.0.36   minikube
localstack-1753213834-z6w15     1/1       Running   0          19s       172.17.0.37   minikube

alertmanager:

$ kubectl logs -n cortex alertmanager-1820770292-bjv09 ; minikube ssh "curl -fsS 172.17.0.33/metrics" | grep log_messages

time="2017-10-05T16:05:53Z" level=warning msg="Error discovering services for mesh : lookup _mesh._tcp. on 10.0.0.10:53: no such host" source="peers.go:68" 
time="2017-10-05T16:05:57Z" level=warning msg="MultitenantAlertmanager: configs server poll failed: Get http://configs.cortex.svc.cluster.local:80/private/api/prom/configs/alertmanager: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" source="multitenant.go:340" 
time="2017-10-05T16:05:57Z" level=warning msg="MultitenantAlertmanager: error fetching all configurations, backing off: Get http://configs.cortex.svc.cluster.local:80/private/api/prom/configs/alertmanager: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)" source="multitenant.go:312" 
time="2017-10-05T16:05:57Z" level=debug msg="MultitenantAlertmanager: found 0 configurations in initial load" source="multitenant.go:309" 
time="2017-10-05T16:05:57Z" level=debug msg="Adding 0 configurations" source="multitenant.go:351" 
time="2017-10-05T16:06:12Z" level=debug msg="Adding 0 configurations" source="multitenant.go:351" 
time="2017-10-05T16:06:27Z" level=debug msg="Adding 0 configurations" source="multitenant.go:351" 
time="2017-10-05T16:06:42Z" level=debug msg="Adding 0 configurations" source="multitenant.go:351" 
time="2017-10-05T16:06:53Z" level=warning msg="Error discovering services for mesh : lookup _mesh._tcp. on 10.0.0.10:53: no such host" source="peers.go:68" 
[...]
time="2017-10-05T16:22:58Z" level=warning msg="Error discovering services for mesh : lookup _mesh._tcp. on 10.0.0.10:53: no such host" source="peers.go:68" 
time="2017-10-05T16:23:12Z" level=debug msg="Adding 0 configurations" source="multitenant.go:351" 
time="2017-10-05T16:23:27Z" level=debug msg="Adding 0 configurations" source="multitenant.go:351" 

# HELP log_messages Total number of log messages.
# TYPE log_messages counter
log_messages{level="debug"} 72
log_messages{level="error"} 0
log_messages{level="info"} 0
log_messages{level="warning"} 20

configs:

$ kubectl logs -n cortex configs-3938330905-h2jz7 ; minikube ssh "curl -fsS 172.17.0.34/metrics" | grep log_messages

time="2017-10-05T16:05:52Z" level=info msg="Running Database Migrations..." source="postgres.go:47" 

# HELP log_messages Total number of log messages.
# TYPE log_messages counter
log_messages{level="debug"} 0
log_messages{level="error"} 0
log_messages{level="info"} 1
log_messages{level="warning"} 0

lite:

$ kubectl logs -n cortex lite-601389945-8rvf3 ; minikube ssh "curl -fsS 172.17.0.36/metrics" | grep log_messages

[...]
time="2017-10-05T16:06:23Z" level=error msg="Error syncing tables: RequestError: send request failed
caused by: Post http://localstack.cortex.svc.cluster.local:4569/: dial tcp 10.0.0.30:4569: i/o timeout" source="table_manager.go:139" 
time="2017-10-05T16:06:23Z" level=debug msg="Get ring (0)" source="consul_client_mock.go:97" 
[...]
time="2017-10-05T16:24:58Z" level=debug msg="Adding 0 configurations" source="scheduler.go:179" 
time="2017-10-05T16:25:03Z" level=debug msg="Get ring (0)" source="consul_client_mock.go:97" 
time="2017-10-05T16:25:03Z" level=debug msg="Get ring (231) = "\xa8\x1e\xe8\n0\n\x14lite-601389945-8rvf3\x12\x18\n\x10172.17.0."..." source="consul_client_mock.go:119" 
time="2017-10-05T16:25:03Z" level=debug msg="CAS ring (231) <- "\xa8\x1e\xe8\n0\n\x14lite-601389945-8rvf3\x12\x18\n\x10172.17.0."..." source="consul_client_mock.go:70" 
time="2017-10-05T16:25:03Z" level=debug msg="Get ring (232) = "\xa8\x1e\xe8\n0\n\x14lite-601389945-8rvf3\x12\x18\n\x10172.17.0."..." source="consul_client_mock.go:119" 
time="2017-10-05T16:25:03Z" level=debug msg="Get ring (232)" source="consul_client_mock.go:97" 

# HELP log_messages Total number of log messages.
# TYPE log_messages counter
log_messages{level="debug"} 2092
log_messages{level="error"} 9
log_messages{level="info"} 43
log_messages{level="warning"} 2

Similar metrics are to be expected once deployed in dev and prod.

Steps followed:
- add `promrus` to be able to expose metrics on log messages ("legacy" version used so that we keep using `Sirupsen/logrus` instead of `sirupsen/logrus` for now, as it leads to too many conflicts otherwise):
```
dep ensure github.com/weaveworks/promrus@v1.0.0-legacy && dep ensure && dep prune
git status | grep BUILD.bazel | cut -d' ' -f 5 | xargs git checkout HEAD
```
- add ability to add hook to `prometheus/common/log`, for now directly in `vendor`, change will eventually have to be made upstream, and Cortex upgraded,
- add hook in all `main.go`.
@marccarre marccarre requested a review from jml October 5, 2017 16:31
hook, err := promrus.NewPrometheusHook()
if err != nil {
log.Fatalf("Error initializing promrus: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a MustNewPrometheusHook variant later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Let's keep it DRY. I didn't know that MustNew* was actually so idiomatic, thanks for pointing it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #573.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants