Skip to content

Comments

Conntrack kernel stats#1093

Closed
carlpett wants to merge 2 commits intoprometheus:masterfrom
carlpett:conntrack-kernel-stats
Closed

Conntrack kernel stats#1093
carlpett wants to merge 2 commits intoprometheus:masterfrom
carlpett:conntrack-kernel-stats

Conversation

@carlpett
Copy link
Member

@carlpett carlpett commented Sep 30, 2018

This PR adds conntrack kernel statistics to the conntrack collector, similar to the data obtained with conntrack -S. Of extra interest is the insert_failed counter, which for example allows monitoring for some situations such as kubernetes#56903.

This does require CAP_NET_ADMIN, however, so I've put these metrics behind a flag, --collector.conntrack.kernel-stats.

@SuperQ
Copy link
Member

SuperQ commented Oct 1, 2018

/cc @rtreffer @matthiasr I know you two have run into conntrack issues before.

I'm not sure we want features in the node_exporter that require CAP_NET_ADMIN. We have a policy to keep this exporter unprivileged. We need to consider this carefully, since it starts to make the exporter a privilege escalation target.

@rtreffer
Copy link
Contributor

rtreffer commented Oct 4, 2018

We had our own fair share of conntrack / DNS interaction, too. We ended up patching the UDP timeout in the kernel.

We are alerting on 100*node_nf_conntrack_entries / node_nf_conntrack_entries_limit > 90 (paging) and 100*node_nf_conntrack_entries / node_nf_conntrack_entries_limit > 50 (warning).

Are there cases where the conntrack table would be low but inserts still fail?

Given the kernel config CONFIG_NF_CONNTRACK_PROCFS there is also a /proc/net/stat/nf_conntrack containing the same data, world readable. However that option is usually disabled (and deprecated iirc - reading /proc/net/nf_conntrack which is controlled by the same setting is a bad idea due to a big lock).

Overall: monitoring the conntrack table is very useful as an overflow is a severe issue and hard to detect.

@discordianfish
Copy link
Member

@rtreffer Curious about your kernel patch/fixes.. Maybe drop a note in kubernetes/kubernetes#56903 ?

I also ran into this DNS issue without a fix, none of the workarounds helped. So yeah I find these metrics important but also not sure about adding stuff that needs elevated capabilities. Given this isn't the first time we'd need them, maybe we should revisit the 'no capabilities' requirement to 'no root' and all defaults need not to require capabilities?

@discordianfish
Copy link
Member

That being said, currently I'm using this textfile collector for the insert fails which works fine too:

#!/bin/bash
set -euo pipefail

conntrack -S | while read -ra parts; do
  IFS='=' read -ra cpu <<< "${parts[0]}"
  for m in "${parts[@]:1}"; do
    IFS='=' read -ra kv <<< "$m"
    echo "conntrack_${kv[0]}_total{cpu=\"${cpu[1]}\"} ${kv[1]}"
  done
done

@carlpett
Copy link
Member Author

carlpett commented Oct 4, 2018

@rtreffer The "insert failed" state isn't actually related to the connection tracking table being full, but rather a race condition within the kernel. More info in the linked kubernetes info.

@discordianfish Your textfile collector does basically the same thing, yes.

@SuperQ
Copy link
Member

SuperQ commented Oct 4, 2018

Yea, while I like the idea of this collector, I don't think we want to have the exporter use CAP_NET_ADMIN. Perhaps we can find someone in the Linux kernel team that can help us find a way to get this data without root.

@discordianfish
Copy link
Member

@SuperQ Yeah wondering about the reason for requiring capabilities for gathering this. Not sure what the kernel policy for these things are.

@carlpett
Copy link
Member Author

carlpett commented Oct 4, 2018

It is possible to use netlink in general without CAP_NET_ADMIN (the wifi collector does so, for example), so probably what is needed is that the kernel has to start handling the different interfaces that the netfilter bits exposes differently, depending on if they are mutating/sensitive or not.
No idea how big/small this change is, though.

@SuperQ
Copy link
Member

SuperQ commented Oct 4, 2018

@rtreffer Also, wasn't there some kernel patches for conntrack table size sysctl settings that are not inherited? Or did that get fixed in Docker now?

@discordianfish
Copy link
Member

So yeah, to include this we'd need the kernel to provide the info to unprivileged processes. Is that someone is up to bringing up and tracking it? If not, I think we should close the PR for now.

@ti-mo
Copy link

ti-mo commented Nov 14, 2018

(library creator here) The conntrack nfnetlink family requires NET_ADMIN if you want to as much as dial into it. Since that family can be used to manipulate the conntrack table, conntrack stats would need to be split off into a different family for this to be reasonably secure.

As much as I'd be honored to have my lib being used by the exporter, I would opt to get the stats out of procfs instead, because it will just be available to everybody without requiring root. I predict this would be the response on LKML in case anyone would bother to ask there. :)

@carlpett
Copy link
Member Author

Well then, seems like there isn't much to do then. At least I got to learn a bit about netlink in the process :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants