Track Kafka client side metrics via Kamon#4481
Conversation
|
@chetanmeh IMHO, we can collect these metrics out of OpenWhisk. Similarly, I am not sure we would collect some metrics in CouchDB via OpenWhisk even if we want to see some CouchDB metrics as well. |
Codecov Report
@@ Coverage Diff @@
## master #4481 +/- ##
==========================================
- Coverage 83.99% 80.45% -3.55%
==========================================
Files 170 171 +1
Lines 7940 7980 +40
Branches 536 532 -4
==========================================
- Hits 6669 6420 -249
- Misses 1271 1560 +289
Continue to review full report at Codecov.
|
@style95 Any pointers on that? So far did not find a easy way to get handle to client side metrics in OpenWhisk. Only approach I saw so far was using some form of JMX Exporter which are tricky to setup. Hence using this approach Note that this PR only enables client side metrics and not broker side ones |
|
@chetanmeh I meant no offense. I just want to discuss this with you. I have not looked into it deeply yet, but there is also a similar interesting approach. |
|
@chetanmeh I personally think that this extension is very useful. We already have integration with |
|
@style95 Agreed that JMX exporters can be used. However I find them tricky to setup and each monitoring system has its own way of exporting (Datadog, Prometheus etc). In most cases it involves running another agent which needs a custom Docker build and for Prometheus results in having 2 http servers running exposing the metrics. As OpenWhisk uses Kamon which abstract away all such integrations I was looking for a way to route such metrics to Kamon and then sent to various monitoring systems in a uniform way. Further note that proposed |
|
Can someone mention this pr on the dev list? |
|
Did not got any feedback on dev thread. Would it be ok to merge this and have this feature disabled by default? |
|
@chetanmeh LGTM - is this redundant though (when turned on) with the other kafka metric that's emitted by the scheduled actor? |
This exposes more metrics in addition to lag. So kind of complements what we have in scheduled actor. Going forward we may even think of removing the logic in scheduled actor and make use of approach like done here val server = ManagementFactory.getPlatformMBeanServer
val name = new ObjectName(s"kafka.consumer:type=consumer-fetch-manager-metrics,client-id=$id")
def consumerLag: Long = server.getAttribute(name, "records-lag-max").asInstanceOf[Double].toLong.max(0) |
|
i like it 👍 |
Adds a configurable MetricsReporter to route Kafka metrics to Kamon once enabled. Set of metrics names which need to be captured needs to be explicitly configured
Tracks Kafka client metrics via Kamon for monitoring
Description
Currently Kafka metrics are not getting tracked via Kamon. Due to this we do not gain any insight into the Kafka interactions. Out of the box Kafka tracks quite a few metrics on client side these metrics are exposed via JMX
It also support custom MetricsReporter to listen to such metrics. This PR makes use of same reporter support to publish the metrics to Kamon (based on approach taken in kamon-metrics-reporter)
Usage
KamonMetricsReporterneeds to be enabled via config and provided a set of metric names to track.Once enabled those metrics would be pushed to Kamon. For above config following metrics can be seen in Prometheus
Implementation
This PR takes a whitelist approach and does not publishes all metrics by default. As there are more than 300 metrics tracked by Kafka across Producer and Consumer.
For counters Kafka records two types of metrics
totalandrate. See KIP-187 for details. So we should ignore metrics ending withrateand prefertotalmetrics for Kamon trackingRelated issue and scope
My changes affect the following components
Types of changes
Checklist: