fix(gohai): fall back to numeric UID when username lookup fails#49557
Closed
daniel-taf wants to merge 2 commits intomainfrom
Closed
fix(gohai): fall back to numeric UID when username lookup fails#49557daniel-taf wants to merge 2 commits intomainfrom
daniel-taf wants to merge 2 commits intomainfrom
Conversation
Containerized processes often run as UIDs that don't exist in the host's /etc/passwd (e.g. a UID created inside a container image). When gopsutil's Username() fails for these processes, newProcessInfo returned an error causing the process to be silently dropped from the gohai resource check payload. This meant even the largest memory consumer on a host could be completely invisible to the "Processes memory usage" widget if it ran as an unmapped UID, while Live Processes (which uses a separate pipeline) showed it correctly. Fall back to the numeric UID string instead of failing, so these processes are included in the top-20 list sent to the widget pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
92949a5 to
944e8f7
Compare
Extract username resolution into a testable resolveUsername function with a usernameProvider interface, and add table-driven tests covering: - successful username lookup - fallback to numeric UID when username lookup fails - fallback to empty string when both username and UID lookups fail Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Recreating with shorter branch name — image tag exceeded 63-char K8s limit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Falls back to the numeric UID string when
gopsutil.Process.Username()fails in the gohai resource check, instead of silently dropping the process.Motivation
The "Processes memory usage" widget on the host infrastructure page was missing processes — including top memory consumers — because
newProcessInfoinpkg/gohai/processes/gops/process_info.gotreated aUsername()failure as fatal and silently skipped the process.Containerized processes commonly run as UIDs created inside their container image (e.g.
RUN useradd -u 501 app). These UIDs don't exist in the host's/etc/passwd, which is mounted into the Datadog agent container. Whengopsutiltries to resolve the UID to a username viauser.LookupId(), it fails, and the entire process is dropped from the gohai payload before the top-20 sort even happens.Impact: On a staging K8s node with 132 GB RAM, the #1 memory consumer (
adrian-cache, ~128 GB RSS, UID 501) was completely invisible to the widget. The widget showed only ~2.78% memory usage (from processes running asroot/nobody) while the host was at ~86% utilization. Live Processes (separate pipeline) showed correct data.Fixes PRMS-3140.
Describe how you validated your changes
i-08b1214944310a57a(stripe cluster) thatadrian-cacheruns as UID 501, which is absent from the host's/etc/passwdagent diagnose show-metadata gohaioutput excludes adrian-cache (128 GB RSS) from the top-20 process listgo test ./pkg/gohai/processes/...)go vetcleanAdditional Notes
The username field flows through to the widget's
color_by: "user"in the treemap — tiles will be colored by the numeric UID string (e.g."501") instead of a resolved name. This is cosmetically imperfect but functionally correct; the critical fields (rss,pct_mem,family) are unaffected.🤖 Generated with Claude Code