Skip to content

Improving Disk Metrics: Distinguishing Real Disks from Pseudo-Filesystems#48766

Merged
gh-worker-dd-mergequeue-cf854d[bot] merged 11 commits intomainfrom
jose/add-disk-physical-tag-and-metrics
Apr 6, 2026
Merged

Improving Disk Metrics: Distinguishing Real Disks from Pseudo-Filesystems#48766
gh-worker-dd-mergequeue-cf854d[bot] merged 11 commits intomainfrom
jose/add-disk-physical-tag-and-metrics

Conversation

@jose-manuel-almaza
Copy link
Copy Markdown
Contributor

@jose-manuel-almaza jose-manuel-almaza commented Apr 1, 2026

What does this PR do?

  • Adds a new tag is_physical_storage to every system.disk.* metric if tag_by_physical_storage configuration option (defaults to false) is enabled
  • Emits a new set of metrics: system.disk.physical_total, system.disk.physical_used, system.disk.physical_free, system.disk.physical_utilized, and system.disk.physical_in_use if collect_physical_metrics configuration option (defaults to false) is enabled

Both options default to false, preserving full backward compatibility so no extra syscalls or behavioral changes when disabled.

Motivation

#5921

Describe how you validated your changes

  • 15 new unit tests covering:
    • Default config (no tag, no physical metrics)
    • Explicit disable of each option
    • Enable tag_by_physical_storage: physical and non-physical partitions tagged correctly
    • Enable collect_physical_metrics: physical metrics emitted only for real devices
    • include_all_devices: false: only physical partitions reported
    • All-partitions call (all=true) complete failure: graceful degradation, only physical reported
    • All-partitions call (all=true) partial failure: non-physical partitions still classified from available results
    • Physical-partitions call (all=false) complete failure: check returns error
    • Physical-partitions call (all=false) partial failure: classification skipped to avoid misclassifying real disks as non-physical
    • Both scans yield zero partitions: check returns error instead of silently reporting success
    • Bind mounts of physical devices: classified as physical (not silently dropped)
    • Zero physical partitions (container scenario): non-physical partitions still reported
    • Non-Linux platforms (Windows, macOS): classification silently disabled at config time
  • All existing tests pass unchanged

Possible Drawbacks / Trade-offs

  • When either feature is enabled, the check makes two partition enumeration syscalls instead of one (one for physical-only, one for all devices). This is unavoidable to classify partitions.
  • When the physical-only scan returns a partial result with an error, classification is skipped for that check interval to avoid incorrect tags. Full classification resumes on the next successful scan.
  • Linux only: gopsutil ignores the all parameter on Windows and macOS, so classification is automatically disabled on those platforms with a log warning.
  • When both features are disabled (default), behavior is identical to before this change.

@jose-manuel-almaza jose-manuel-almaza requested a review from a team as a code owner April 1, 2026 16:32
@dd-octo-sts dd-octo-sts Bot added the internal Identify a non-fork PR label Apr 1, 2026
@github-actions github-actions Bot added the medium review PR review might take time label Apr 1, 2026
@jose-manuel-almaza jose-manuel-almaza added the qa/done QA done before merge and regressions are covered by tests label Apr 1, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 77db8723d3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go Outdated
Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go Outdated
jose-manuel-almaza added a commit that referenced this pull request Apr 1, 2026
…l disk classification

Address two review comments on PR #48766:
- Classify bind mounts by device name so partitions sharing a physical
  device (different mountpoint) are tagged as physical instead of being
  silently dropped.
- Handle partial success from the all-partitions syscall: when some
  partitions are returned alongside an error, classify the available
  results instead of discarding them.

Add three tests covering the new error/edge-case paths.
@jose-manuel-almaza
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 88f8818eb5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go
@jose-manuel-almaza
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3d26066249

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go Outdated
@jose-manuel-almaza
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64a81d68ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go Outdated
Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go
@github-actions github-actions Bot added long review PR is complex, plan time to review it and removed medium review PR review might take time labels Apr 1, 2026
@jose-manuel-almaza
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 64a81d68ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go Outdated
Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go
@jose-manuel-almaza jose-manuel-almaza force-pushed the jose/add-disk-physical-tag-and-metrics branch from d597365 to 8117521 Compare April 1, 2026 17:46
@jose-manuel-almaza
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8117521ccd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go
@jose-manuel-almaza
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dba9fb8e69

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go Outdated
@jose-manuel-almaza
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f4017d09a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pkg/collector/corechecks/system/disk/diskv2/disk.go
@jose-manuel-almaza
Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@jose-manuel-almaza jose-manuel-almaza added the ask-review Ask required teams to review this PR label Apr 1, 2026
@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr Bot commented Apr 1, 2026

Files inventory check summary

File checks results against ancestor 2996c913:

Results for datadog-agent_7.79.0~devel.git.452.d14baab.pipeline.106150563-1_amd64.deb:

No change detected

@agent-platform-auto-pr
Copy link
Copy Markdown
Contributor

agent-platform-auto-pr Bot commented Apr 1, 2026

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 2996c91
📊 Static Quality Gates Dashboard
🔗 SQG Job

Successful checks

Info

Quality gate Change Size (prev → curr → max)
agent_deb_amd64 +4.0 KiB (0.00% increase) 753.022 → 753.026 → 753.380
agent_deb_amd64_fips +8.0 KiB (0.00% increase) 709.957 → 709.965 → 713.900
agent_heroku_amd64 +12.0 KiB (0.00% increase) 313.356 → 313.368 → 320.580
agent_msi +7.5 KiB (0.00% increase) 604.912 → 604.919 → 651.440
agent_rpm_amd64 +4.0 KiB (0.00% increase) 753.006 → 753.010 → 753.350
agent_rpm_amd64_fips +8.0 KiB (0.00% increase) 709.941 → 709.949 → 713.880
agent_rpm_arm64 +8.0 KiB (0.00% increase) 731.423 → 731.431 → 735.290
agent_rpm_arm64_fips +8.0 KiB (0.00% increase) 691.387 → 691.395 → 696.840
agent_suse_amd64 +4.0 KiB (0.00% increase) 753.006 → 753.010 → 753.350
agent_suse_amd64_fips +8.0 KiB (0.00% increase) 709.941 → 709.949 → 713.880
agent_suse_arm64 +8.0 KiB (0.00% increase) 731.423 → 731.431 → 735.290
agent_suse_arm64_fips +8.0 KiB (0.00% increase) 691.387 → 691.395 → 696.840
docker_agent_amd64 +4.0 KiB (0.00% increase) 813.326 → 813.329 → 815.700
docker_agent_arm64 +8.0 KiB (0.00% increase) 816.513 → 816.521 → 821.970
docker_agent_jmx_amd64 +4.0 KiB (0.00% increase) 1004.241 → 1004.245 → 1006.580
docker_agent_jmx_arm64 +8.0 KiB (0.00% increase) 996.207 → 996.215 → 1001.570
iot_agent_deb_amd64 +5.38 KiB (0.01% increase) 43.239 → 43.244 → 44.290
iot_agent_deb_arm64 +9.38 KiB (0.02% increase) 40.286 → 40.295 → 41.920
iot_agent_deb_armhf +5.38 KiB (0.01% increase) 41.037 → 41.043 → 42.100
iot_agent_rpm_amd64 +5.38 KiB (0.01% increase) 43.239 → 43.244 → 44.290
iot_agent_suse_amd64 +5.38 KiB (0.01% increase) 43.239 → 43.244 → 44.290
10 successful checks with minimal change (< 2 KiB)
Quality gate Current Size
docker_cluster_agent_amd64 203.961 MiB
docker_cluster_agent_arm64 218.419 MiB
docker_cws_instrumentation_amd64 7.142 MiB
docker_cws_instrumentation_arm64 6.689 MiB
docker_dogstatsd_amd64 39.234 MiB
docker_dogstatsd_arm64 37.445 MiB
dogstatsd_deb_amd64 29.886 MiB
dogstatsd_deb_arm64 28.034 MiB
dogstatsd_rpm_amd64 29.886 MiB
dogstatsd_suse_amd64 29.886 MiB
On-wire sizes (compressed)
Quality gate Change Size (prev → curr → max)
agent_deb_amd64 neutral 174.771 MiB → 178.360
agent_deb_amd64_fips +11.91 KiB (0.01% increase) 165.361 → 165.373 → 172.790
agent_heroku_amd64 +7.98 KiB (0.01% increase) 75.014 → 75.021 → 79.970
agent_msi -4.0 KiB (0.00% reduction) 138.422 → 138.418 → 146.220
agent_rpm_amd64 +32.81 KiB (0.02% increase) 177.578 → 177.610 → 181.830
agent_rpm_amd64_fips +29.16 KiB (0.02% increase) 167.637 → 167.666 → 173.370
agent_rpm_arm64 -28.01 KiB (0.02% reduction) 159.546 → 159.518 → 163.060
agent_rpm_arm64_fips +18.8 KiB (0.01% increase) 151.398 → 151.417 → 156.170
agent_suse_amd64 +32.81 KiB (0.02% increase) 177.578 → 177.610 → 181.830
agent_suse_amd64_fips +29.16 KiB (0.02% increase) 167.637 → 167.666 → 173.370
agent_suse_arm64 -28.01 KiB (0.02% reduction) 159.546 → 159.518 → 163.060
agent_suse_arm64_fips +18.8 KiB (0.01% increase) 151.398 → 151.417 → 156.170
docker_agent_amd64 neutral 268.183 MiB → 272.480
docker_agent_arm64 +10.51 KiB (0.00% increase) 255.381 → 255.391 → 261.060
docker_agent_jmx_amd64 -4.14 KiB (0.00% reduction) 336.838 → 336.834 → 341.100
docker_agent_jmx_arm64 +15.45 KiB (0.00% increase) 320.012 → 320.027 → 325.620
docker_cluster_agent_amd64 neutral 71.374 MiB → 72.920
docker_cluster_agent_arm64 neutral 66.999 MiB → 68.220
docker_cws_instrumentation_amd64 neutral 2.999 MiB → 3.330
docker_cws_instrumentation_arm64 neutral 2.729 MiB → 3.090
docker_dogstatsd_amd64 neutral 15.174 MiB → 15.820
docker_dogstatsd_arm64 neutral 14.487 MiB → 14.830
dogstatsd_deb_amd64 neutral 7.893 MiB → 8.790
dogstatsd_deb_arm64 neutral 6.778 MiB → 7.710
dogstatsd_rpm_amd64 +2.18 KiB (0.03% increase) 7.904 → 7.906 → 8.800
dogstatsd_suse_amd64 +2.18 KiB (0.03% increase) 7.904 → 7.906 → 8.800
iot_agent_deb_amd64 +4.14 KiB (0.04% increase) 11.389 → 11.393 → 13.040
iot_agent_deb_arm64 +2.68 KiB (0.03% increase) 9.712 → 9.715 → 11.450
iot_agent_deb_armhf +3.95 KiB (0.04% increase) 9.931 → 9.935 → 11.620
iot_agent_rpm_amd64 +4.97 KiB (0.04% increase) 11.410 → 11.415 → 13.060
iot_agent_suse_amd64 +4.97 KiB (0.04% increase) 11.410 → 11.415 → 13.060

@cit-pr-commenter-54b7da
Copy link
Copy Markdown

cit-pr-commenter-54b7da Bot commented Apr 1, 2026

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 4bac1366-ac8d-41d0-a070-7ef722e431b4

Baseline: 0d4f770
Comparison: 3e0a1a6
Diff

Optimization Goals: ✅ No significant changes detected

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
docker_containers_cpu % cpu utilization -2.03 [-5.10, +1.03] 1 Logs

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
quality_gate_logs % cpu utilization +2.57 [+0.89, +4.25] 1 Logs bounds checks dashboard
otlp_ingest_logs memory utilization +1.08 [+0.97, +1.20] 1 Logs
quality_gate_metrics_logs memory utilization +0.72 [+0.49, +0.96] 1 Logs bounds checks dashboard
quality_gate_idle_all_features memory utilization +0.38 [+0.35, +0.42] 1 Logs bounds checks dashboard
ddot_metrics_sum_delta memory utilization +0.22 [+0.05, +0.38] 1 Logs
ddot_metrics_sum_cumulativetodelta_exporter memory utilization +0.14 [-0.09, +0.36] 1 Logs
ddot_metrics memory utilization +0.12 [-0.07, +0.31] 1 Logs
file_tree memory utilization +0.04 [-0.02, +0.10] 1 Logs
ddot_metrics_sum_cumulative memory utilization +0.02 [-0.13, +0.17] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput +0.01 [-0.10, +0.12] 1 Logs
uds_dogstatsd_to_api_v3 ingress throughput -0.01 [-0.22, +0.19] 1 Logs
uds_dogstatsd_to_api ingress throughput -0.02 [-0.22, +0.18] 1 Logs
quality_gate_idle memory utilization -0.02 [-0.07, +0.03] 1 Logs bounds checks dashboard
uds_dogstatsd_20mb_12k_contexts_20_senders memory utilization -0.03 [-0.09, +0.03] 1 Logs
file_to_blackhole_1000ms_latency egress throughput -0.07 [-0.50, +0.37] 1 Logs
file_to_blackhole_100ms_latency egress throughput -0.07 [-0.17, +0.04] 1 Logs
file_to_blackhole_0ms_latency egress throughput -0.09 [-0.64, +0.46] 1 Logs
file_to_blackhole_500ms_latency egress throughput -0.10 [-0.49, +0.29] 1 Logs
docker_containers_memory memory utilization -0.18 [-0.26, -0.10] 1 Logs
ddot_logs memory utilization -0.35 [-0.42, -0.28] 1 Logs
otlp_ingest_metrics memory utilization -0.62 [-0.78, -0.46] 1 Logs
tcp_syslog_to_blackhole ingress throughput -0.95 [-1.12, -0.77] 1 Logs
docker_containers_cpu % cpu utilization -2.03 [-5.10, +1.03] 1 Logs

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed observed_value links
docker_containers_cpu simple_check_run 10/10 719 ≥ 26
docker_containers_memory memory_usage 10/10 272.06MiB ≤ 370MiB
docker_containers_memory simple_check_run 10/10 678 ≥ 26
file_to_blackhole_0ms_latency memory_usage 10/10 0.19GiB ≤ 1.20GiB
file_to_blackhole_0ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_1000ms_latency memory_usage 10/10 0.23GiB ≤ 1.20GiB
file_to_blackhole_1000ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_100ms_latency memory_usage 10/10 0.19GiB ≤ 1.20GiB
file_to_blackhole_100ms_latency missed_bytes 10/10 0B = 0B
file_to_blackhole_500ms_latency memory_usage 10/10 0.21GiB ≤ 1.20GiB
file_to_blackhole_500ms_latency missed_bytes 10/10 0B = 0B
quality_gate_idle intake_connections 10/10 3 = 3 bounds checks dashboard
quality_gate_idle memory_usage 10/10 171.80MiB ≤ 181MiB bounds checks dashboard
quality_gate_idle_all_features intake_connections 10/10 3 = 3 bounds checks dashboard
quality_gate_idle_all_features memory_usage 10/10 491.16MiB ≤ 550MiB bounds checks dashboard
quality_gate_logs intake_connections 10/10 3 ≤ 6 bounds checks dashboard
quality_gate_logs memory_usage 10/10 208.82MiB ≤ 220MiB bounds checks dashboard
quality_gate_logs missed_bytes 10/10 0B = 0B bounds checks dashboard
quality_gate_metrics_logs cpu_usage 10/10 340.34 ≤ 2000 bounds checks dashboard
quality_gate_metrics_logs intake_connections 10/10 3 ≤ 6 bounds checks dashboard
quality_gate_metrics_logs memory_usage 10/10 422.88MiB ≤ 475MiB bounds checks dashboard
quality_gate_metrics_logs missed_bytes 10/10 0B = 0B bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

CI Pass/Fail Decision

Passed. All Quality Gates passed.

  • quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.

Copy link
Copy Markdown
Contributor

@nathan-b nathan-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a test for include_all_devices: false with tag_by_physical_storage: true to ensure that no is_physical_storage:false tags appear.

Copy link
Copy Markdown
Member

@s-alad s-alad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for agent config!

@jose-manuel-almaza
Copy link
Copy Markdown
Contributor Author

I don't see a test for include_all_devices: false with tag_by_physical_storage: true to ensure that no is_physical_storage:false tags appear.

Good catch! the test TestGivenADiskCheckWithIncludeAllDevicesDisabled_WhenCheckRuns_ThenOnlyPhysicalPartitionsAreReported already covered this scenario but was missing the explicit is_physical_storage:false absence assertion. Added it now. Thanks!

Add two new opt-in configuration options for the disk check:
- `tag_by_physical_storage`: adds `is_physical_storage:true/false` tag
  to every `system.disk.*` metric
- `collect_physical_metrics`: emits `system.disk.physical_{total,used,
  free,utilized,in_use}` metrics for physical devices only

Both default to false, preserving backward compatibility (no extra
syscalls or behavioral changes when disabled).

Closes #5921
…l disk classification

Address two review comments on PR #48766:
- Classify bind mounts by device name so partitions sharing a physical
  device (different mountpoint) are tagged as physical instead of being
  silently dropped.
- Handle partial success from the all-partitions syscall: when some
  partitions are returned alongside an error, classify the available
  results instead of discarding them.

Add three tests covering the new error/edge-case paths.
When getDiskPartitionsWithTimeout(false) returns a partial result with
an error, the physicalDevices set is incomplete. Classifying all=true
results against it would tag real physical disks as non-physical.

Skip the all=true classification branch in this case and report only
the physical partitions we successfully retrieved.
The goroutine clears the in-flight guard via defer after sending to the
buffered result channel. A sequential caller can receive the result and
re-enter getDiskPartitionsWithTimeout before the defer runs, hitting the
guard spuriously. Clear the flag eagerly on the success path; the
goroutine's deferred Store is redundant but harmless.
…w options

- Move partitionEnumInFlight.Store(false) before the channel send so
  sequential calls within the same check run never hit the guard. The
  previous receiver-side Store introduced a race where call #1's defer
  could clear the flag while call #2 was active.
- Disable tag_by_physical_storage and collect_physical_metrics at config
  time on Windows, where gopsutil ignores the all parameter and both
  syscalls return identical results.
- Add new options to conf.yaml.default with platform support notes.
- Add Windows gate test.
When classification is enabled and the physical scan returns empty
(container-like host) while the all-partitions scan fails completely,
return the error instead of silently returning empty slices with nil.
This preserves error visibility so the check does not report success
while emitting zero metrics.
Remove the physicalScanPartial guard that skipped classification when
the physical-only scan returned partial results. Skipping the all=true
scan entirely caused base system.disk.* metrics to be dropped for
non-physical partitions, which is worse than the transient
misclassification it was trying to prevent. Classification now always
proceeds; during a partial physical scan some physical partitions may
be temporarily tagged as non-physical until the next successful run.
When the physical-only partition scan (all=false) returns partial results,
partitions not in that incomplete set were incorrectly tagged
is_physical_storage:false. Introduce an "unclassified" category so these
partitions still emit base system.disk.* metrics but receive no
is_physical_storage tag until classification data is reliable.
…all_devices:false

The existing test already covered the scenario but lacked an explicit
assertion that no is_physical_storage:false tags appear when
include_all_devices is disabled with tag_by_physical_storage enabled.
@jose-manuel-almaza jose-manuel-almaza force-pushed the jose/add-disk-physical-tag-and-metrics branch from 2c59844 to 59a018f Compare April 6, 2026 06:37
@jose-manuel-almaza
Copy link
Copy Markdown
Contributor Author

/merge

@gh-worker-devflow-routing-ef8351
Copy link
Copy Markdown

gh-worker-devflow-routing-ef8351 Bot commented Apr 6, 2026

View all feedbacks in Devflow UI.

2026-04-06 06:55:54 UTC ℹ️ Start processing command /merge


2026-04-06 06:56:01 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2026-04-06 09:11:17 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 2h (p90).


2026-04-06 09:48:32 UTC ℹ️ MergeQueue: This merge request was merged

Use AssertMetricTaggedWith with device: tag instead of AssertMetric
with exact device_name: tag matching. baseDeviceName() differs between
Linux (filepath.Base) and Windows (trim backslashes), so
device_name:sda1 vs device_name:/dev/sda1 caused CI failure on Windows.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot merged commit 3e0a1a6 into main Apr 6, 2026
276 of 277 checks passed
@gh-worker-dd-mergequeue-cf854d gh-worker-dd-mergequeue-cf854d Bot deleted the jose/add-disk-physical-tag-and-metrics branch April 6, 2026 09:48
@github-actions github-actions Bot added this to the 7.79.0 milestone Apr 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ask-review Ask required teams to review this PR internal Identify a non-fork PR long review PR is complex, plan time to review it qa/done QA done before merge and regressions are covered by tests team/agent-configuration team/agent-runtimes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disk integration system.disk.total value is incorrect, disk size in host information of Datadog GUI is incorrect, too.

3 participants