Skip to content

Log which collector had an error when pulling#7913

Merged
juliogreff merged 2 commits intomasterfrom
juliogreff/tagger-pull-log
Apr 16, 2021
Merged

Log which collector had an error when pulling#7913
juliogreff merged 2 commits intomasterfrom
juliogreff/tagger-pull-log

Conversation

@juliogreff
Copy link
Copy Markdown
Contributor

What does this PR do?

Previously, *Tagger.pull() would log only that an error happened,
without specifying which collector the error was coming from. It isn't
always obvious by the error itself (for instance, both the kube metadata
and kubelet collectors can error out when getting the pod list from the
kubelet), so now we prefix the log message with the collector name.

Describe your test plan

Blackhole traffic to the kubelet as below. Note that this only works if #7605 is merged, otherwise the tagger will hang and no error will be shown.

$ docker run -it --rm --privileged --pid=host alpine:edge nsenter -t 1 -m -u -n -i sh
# ps | grep "agent run"
2926852 root      0:57 agent run
2931843 root      0:00 grep agent run
# AGENT_PID=2926852
# ps | grep "/usr/bin/kubelet"
43659 root      2h24 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=unix:///run/containerd/containerd.sock --fail-swap-on=false --node-ip=172.18.0.2 --provider-id=kind://docker/kind/kind-control-plane --fail-swap-on=false --cgroup-root=/kubelet
2944246 root      0:00 grep /usr/bin/kubelet
# KUBELET_IP=172.18.0.2
# nsenter -n -t $AGENT_PID iptables -A OUTPUT -d $KUBELET_IP -p tcp --dport 10250 -j DROP

Check that the logs no longer look like this:

2021-04-16 12:16:23 UTC | CORE | WARN | (pkg/tagger/local/tagger.go:217 in pull) | couldn't fetch "podlist": error performing kubelet query https://172.18.0.2:10250/pods: Get "https://172.18.0.2:10250/pods": context canceled

But instead look like this:

2021-04-16 12:16:23 UTC | CORE | WARN | (pkg/tagger/local/tagger.go:217 in pull) | error pulling from kubelet: couldn't fetch "podlist": error performing kubelet query https://172.18.0.2:10250/pods: Get "https://172.18.0.2:10250/pods": context canceled

Previously, *Tagger.pull() would log only that an error happened,
without specifying which collector the error was coming from. It isn't
always obvious by the error itself (for instance, both the kube metadata
and kubelet collectors can error out when getting the pod list from the
kubelet), so now we prefix the log message with the collector name.
@juliogreff juliogreff added this to the 7.28.0 milestone Apr 16, 2021
@juliogreff juliogreff requested a review from a team as a code owner April 16, 2021 12:19
Comment thread pkg/tagger/local/tagger.go Outdated
Co-authored-by: Ahmed Mezghani <38987709+ahmed-mez@users.noreply.github.com>
@juliogreff juliogreff merged commit 1c63e23 into master Apr 16, 2021
@juliogreff juliogreff deleted the juliogreff/tagger-pull-log branch April 16, 2021 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants