Skip to content

fix(gohai): fall back to numeric UID when username lookup fails#49557

Closed
daniel-taf wants to merge 2 commits intomainfrom
daniel.tafoya/gohai-username-fallback
Closed

fix(gohai): fall back to numeric UID when username lookup fails#49557
daniel-taf wants to merge 2 commits intomainfrom
daniel.tafoya/gohai-username-fallback

Conversation

@daniel-taf
Copy link
Copy Markdown
Contributor

What does this PR do?

Falls back to the numeric UID string when gopsutil.Process.Username() fails in the gohai resource check, instead of silently dropping the process.

Motivation

The "Processes memory usage" widget on the host infrastructure page was missing processes — including top memory consumers — because newProcessInfo in pkg/gohai/processes/gops/process_info.go treated a Username() failure as fatal and silently skipped the process.

Containerized processes commonly run as UIDs created inside their container image (e.g. RUN useradd -u 501 app). These UIDs don't exist in the host's /etc/passwd, which is mounted into the Datadog agent container. When gopsutil tries to resolve the UID to a username via user.LookupId(), it fails, and the entire process is dropped from the gohai payload before the top-20 sort even happens.

Impact: On a staging K8s node with 132 GB RAM, the #1 memory consumer (adrian-cache, ~128 GB RSS, UID 501) was completely invisible to the widget. The widget showed only ~2.78% memory usage (from processes running as root/nobody) while the host was at ~86% utilization. Live Processes (separate pipeline) showed correct data.

Fixes PRMS-3140.

Describe how you validated your changes

  • Confirmed on staging host i-08b1214944310a57a (stripe cluster) that adrian-cache runs as UID 501, which is absent from the host's /etc/passwd
  • Verified agent diagnose show-metadata gohai output excludes adrian-cache (128 GB RSS) from the top-20 process list
  • Existing unit tests pass (go test ./pkg/gohai/processes/...)
  • go vet clean

Additional Notes

The username field flows through to the widget's color_by: "user" in the treemap — tiles will be colored by the numeric UID string (e.g. "501") instead of a resolved name. This is cosmetically imperfect but functionally correct; the critical fields (rss, pct_mem, family) are unaffected.

🤖 Generated with Claude Code

@dd-octo-sts dd-octo-sts Bot added internal Identify a non-fork PR team/agent-configuration labels Apr 17, 2026
@github-actions github-actions Bot added the short review PR is simple enough to be reviewed quickly label Apr 17, 2026
Containerized processes often run as UIDs that don't exist in the
host's /etc/passwd (e.g. a UID created inside a container image).
When gopsutil's Username() fails for these processes, newProcessInfo
returned an error causing the process to be silently dropped from
the gohai resource check payload.

This meant even the largest memory consumer on a host could be
completely invisible to the "Processes memory usage" widget if it
ran as an unmapped UID, while Live Processes (which uses a separate
pipeline) showed it correctly.

Fall back to the numeric UID string instead of failing, so these
processes are included in the top-20 list sent to the widget pipeline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@daniel-taf daniel-taf force-pushed the daniel.tafoya/gohai-username-fallback branch from 92949a5 to 944e8f7 Compare April 17, 2026 20:47
Extract username resolution into a testable resolveUsername function
with a usernameProvider interface, and add table-driven tests covering:
- successful username lookup
- fallback to numeric UID when username lookup fails
- fallback to empty string when both username and UID lookups fail

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor Author

Recreating with shorter branch name — image tag exceeded 63-char K8s limit.

@daniel-taf daniel-taf closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal Identify a non-fork PR short review PR is simple enough to be reviewed quickly team/agent-configuration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant