Conversation
|
/cc @rtreffer @matthiasr I know you two have run into conntrack issues before. I'm not sure we want features in the node_exporter that require |
|
We had our own fair share of conntrack / DNS interaction, too. We ended up patching the UDP timeout in the kernel. We are alerting on Are there cases where the conntrack table would be low but inserts still fail? Given the kernel config Overall: monitoring the conntrack table is very useful as an overflow is a severe issue and hard to detect. |
|
@rtreffer Curious about your kernel patch/fixes.. Maybe drop a note in kubernetes/kubernetes#56903 ? I also ran into this DNS issue without a fix, none of the workarounds helped. So yeah I find these metrics important but also not sure about adding stuff that needs elevated capabilities. Given this isn't the first time we'd need them, maybe we should revisit the 'no capabilities' requirement to 'no root' and all defaults need not to require capabilities? |
|
That being said, currently I'm using this textfile collector for the insert fails which works fine too: |
|
@rtreffer The "insert failed" state isn't actually related to the connection tracking table being full, but rather a race condition within the kernel. More info in the linked kubernetes info. @discordianfish Your textfile collector does basically the same thing, yes. |
|
Yea, while I like the idea of this collector, I don't think we want to have the exporter use |
|
@SuperQ Yeah wondering about the reason for requiring capabilities for gathering this. Not sure what the kernel policy for these things are. |
|
It is possible to use netlink in general without CAP_NET_ADMIN (the wifi collector does so, for example), so probably what is needed is that the kernel has to start handling the different interfaces that the netfilter bits exposes differently, depending on if they are mutating/sensitive or not. |
|
@rtreffer Also, wasn't there some kernel patches for conntrack table size sysctl settings that are not inherited? Or did that get fixed in Docker now? |
|
So yeah, to include this we'd need the kernel to provide the info to unprivileged processes. Is that someone is up to bringing up and tracking it? If not, I think we should close the PR for now. |
|
(library creator here) The conntrack nfnetlink family requires As much as I'd be honored to have my lib being used by the exporter, I would opt to get the stats out of procfs instead, because it will just be available to everybody without requiring root. I predict this would be the response on LKML in case anyone would bother to ask there. :) |
|
Well then, seems like there isn't much to do then. At least I got to learn a bit about netlink in the process :) |
This PR adds conntrack kernel statistics to the conntrack collector, similar to the data obtained with
conntrack -S. Of extra interest is theinsert_failedcounter, which for example allows monitoring for some situations such as kubernetes#56903.This does require
CAP_NET_ADMIN, however, so I've put these metrics behind a flag,--collector.conntrack.kernel-stats.