[AGTHEAL-115] refactor(core): migrate comp/core/flare to V2 component architecture#49769
[AGTHEAL-115] refactor(core): migrate comp/core/flare to V2 component architecture#49769louis-cqrl wants to merge 17 commits intomainfrom
Conversation
- Add comp/core/flare/def/component.go with Component interface and Params struct (exported fields) - Add comp/core/flare/fx/fx.go wrapping existing Module() for V2 consumers - Update root component.go to type-alias Component and Params from def/ - Update params.go to shim constructors via def/ package - Update flare.go and providers.go to use exported Params field names
Go Package Import DifferencesBaseline: 4411859
|
This comment has been minimized.
This comment has been minimized.
Files inventory check summaryFile checks results against ancestor 44118599: Results for datadog-agent_7.80.0~devel.git.175.dbdc2e1.pipeline.109524854-1_amd64.deb:No change detected |
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 4411859 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | -0.95 | [-4.02, +2.13] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | otlp_ingest_metrics | memory utilization | +0.67 | [+0.51, +0.82] | 1 | Logs |
| ➖ | quality_gate_metrics_logs | memory utilization | +0.43 | [+0.16, +0.70] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_delta | memory utilization | +0.35 | [+0.16, +0.53] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | +0.34 | [+0.28, +0.41] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.13 | [+0.09, +0.17] | 1 | Logs bounds checks dashboard |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | +0.10 | [-0.08, +0.28] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.08 | [-0.43, +0.60] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | +0.04 | [-0.02, +0.09] | 1 | Logs bounds checks dashboard |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | +0.02 | [-0.10, +0.14] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | +0.02 | [-0.03, +0.07] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.01 | [-0.41, +0.44] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | +0.01 | [-0.20, +0.22] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.11, +0.11] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | -0.01 | [-0.21, +0.19] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | -0.03 | [-0.42, +0.36] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | -0.04 | [-0.28, +0.19] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | -0.12 | [-1.75, +1.52] | 1 | Logs bounds checks dashboard |
| ➖ | file_tree | memory utilization | -0.12 | [-0.17, -0.08] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | -0.28 | [-0.38, -0.18] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | -0.30 | [-0.45, -0.14] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | -0.30 | [-0.40, -0.20] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | -0.64 | [-0.84, -0.44] | 1 | Logs |
| ➖ | docker_containers_cpu | % cpu utilization | -0.95 | [-4.02, +2.13] | 1 | Logs |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 719 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 243.63MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 714 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.16GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.20GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.17GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.18GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 138.08MiB ≤ 143MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 467.30MiB ≤ 495MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 183.58MiB ≤ 195MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 356.91 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 379.76MiB ≤ 430MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
…r_missing; apply ruff formatting to task files
…ied deps; fix ruff formatting
Move the implementation of the flare component from the root package into comp/core/flare/impl/ (package flareimpl) with a proper NewComponent constructor using Requires/Provides structs and compdef.In/compdef.Out embeddings. Update comp/core/flare/fx/fx.go to use fxutil.ProvideComponentConstructor and remove comp/core/flare from the ignore_provide_component_constructor_missing and components_to_migrate lists in tasks/components.py. The root package is kept as a backward-compatible shim exposing Module(), Component, Params, NewParams, and NewLocalParams.
…tion - Delete root package shim files (flare.go, component.go, params.go, providers.go, flare_test.go) now that content lives in def/ and impl/ - Delete flareimpl/mock.go (V1 mock location); add comp/core/flare/mock/ as the canonical V2 mock package using flaredef.Component - Update all callers to import from comp/core/flare/def (Component, Params, NewParams, NewLocalParams) and comp/core/flare/fx (Module) - Simplify root BUILD.bazel to a bare # gazelle:ignore comment
louis-cqrl
left a comment
There was a problem hiding this comment.
Review — Claude Opus 4.7 + Codex (GPT-5) second pass
Inline comments attached to each specific file:line below. One cross-cutting concern here in the summary:
🔴 Scope creep — this PR bundles ~3 unrelated changes
Description says "migrate comp/core/flare to V2", but the diff also contains:
- A brand-new package
comp/logs-library/utils/ipfilter/(~430 LOC) — completely unrelated to flare. - New
packages/installer/packaging:BUILD.bazel,README.md.in,version-manifest.{json,txt}.in, plus ~98 lines inpackages/installer/linux/BUILD.bazel. - A repo-wide Gazelle/Bazel sweep: root
BUILD.bazel,comp/networkpath/**,pkg/ebpf/**,pkg/network/**,pkg/opentelemetry-mapping-go/inframetadata/**,pkg/privateactionrunner/**(~15 new BUILD files),pkg/process/status/,pkg/security/probe/sysctl/,pkg/util/kernel/**,pkg/windowsdriver/**. - ~11 unrelated Python auto-fixes in
tasks/**andtest/new-e2e/**(implicit string concat,dict.fromkeysrewrites, Python 2 shim removal,semver.VersionInfo.isvalid→semver.Version.is_valid, etc.).
Suggest splitting into three PRs: (a) flare V2 migration only, (b) ipfilter + installer packaging, (c) Bazel/Python sweep. This PR should only keep flare files + the one-line tasks/components.py change that drops flare from components_to_migrate.
Reviewers: Claude Opus 4.7 (1M ctx) + Codex CLI (gpt-5).
| @@ -0,0 +1,141 @@ | |||
| // Unless explicitly stated otherwise all files in this repository are licensed | |||
There was a problem hiding this comment.
🔴 This whole package is unrelated to the flare migration. ipfilter/ (filter.go, filter_test.go, denial_info.go, BUILD.bazel, ~430 LOC) has nothing to do with comp/core/flare. See the top-level review for the full scope-creep list — recommend splitting this out into its own PR.
| func getFlare(t *testing.T, overrides map[string]interface{}, fillers ...fx.Option) *flare { | ||
| return getFlareWithParams(t, Params{}, overrides, fillers...) | ||
| // testRequires mirrors Requires but uses fx.In (required by fxutil.Test). | ||
| type testRequires struct { |
There was a problem hiding this comment.
🟡 The testRequires + toRequires shim (lines 62-81) is clear but bespoke. If another V2 component already solved "how to use fxutil.Test against a compdef.In struct", follow the shared pattern instead of duplicating the field list — otherwise every V2 migration will copy this shim.
louis-cqrl
left a comment
There was a problem hiding this comment.
Review phase 2 — after fixup commit 15cec3a
Addressed (11/13 from phase 1) — test compile bug, missing go_library targets in def/ and helpers/, missing gotags in mock/, @// → // normalization, copyright year regressions, stale def package doc, filterNilProviders rationale comment, agent.go alias rename. Deleted the now-obsolete inline comments.
Still open from phase 1:
- 🔴 Scope creep (ipfilter package, installer packaging, repo-wide Gazelle/Python sweep) — see the inline on
comp/logs-library/utils/ipfilter/filter.go. - 🟡
testRequiresshim — judgement call; fine to leave if no shared V2 helper exists.
New findings from this commit — two inline comments below.
louis-cqrl
left a comment
There was a problem hiding this comment.
Review phase 3 — Claude Opus 4.7 + Codex (GPT-5) second-pass on 15cec3a
Codex surfaced three findings; two are new, one duplicates an already-open comment.
Still open (unchanged from phase 1 & 2):
- 🔴 Scope creep —
comp/logs-library/utils/ipfilter/,packages/installer/, repo-wide Gazelle sweep,tasks/**.py. Still in the diff. - 🟠
comp/core/flare/helpers/BUILD.bazel—go_testtarget missing forbuilder_test.go/perm_info_{nix,win}_test.go/send_flare_test.go, andbuilder_mock.go(which has//go:build test) has nowhere to go. Codex flagged the same gap independently. - 🟡
comp/systray/systray/impl/systray.go:25— sameflarealias shadowing asagent.go; not touched by the fixup. - 🟡
comp/core/flare/impl/impl_test.go:62—testRequires/toRequiresshim; judgement call.
New findings this pass — two inline comments below.
Reviewers: Claude Opus 4.7 (1M ctx) + Codex CLI (gpt-5-codex).
| caseID := m.instance.TicketID | ||
| userHandle := m.instance.UserEmail | ||
| response, err := m.flareComponent.Send(flarePath, caseID, userHandle, helpers.NewLocalFlareSource()) | ||
| response, err := m.flareComponent.Send(flarePath, caseID, userHandle, flaretypes.NewLocalFlareSource()) |
There was a problem hiding this comment.
🟡 Flare provenance likely wrong for RC-triggered profiling. This path uses flare.rc_profiling.* config keys (lines 346-348) and calls CreateWithArgs with RC-supplied ProfileDuration/ProfileBlockingRate/ProfileMutexFraction — i.e. the flare is being raised by remote-config-driven profiling. But the upload tags the source as local:
m.flareComponent.Send(flarePath, caseID, userHandle, flaretypes.NewLocalFlareSource())For server-side provenance (and the rc_task_uuid form field in helpers/send_flare.go) to be correct, this should be flaretypes.NewRemoteConfigFlareSource(<uuid>) with the UUID of the RC task that produced the Check instance.
Pre-existing (this PR only renamed the package on the call), but the migration touched the line so it's a natural moment to fix. If the path really is local-only, worth a comment explaining why.
Credit: flagged by codex.
- Remove accidentally included comp/logs-library/utils/ipfilter/ files (restored to main branch state) - Add go_test target to comp/core/flare/helpers/BUILD.bazel covering builder_mock.go and all test files with correct platform selects - Add RCListener field to MockProvides in comp/core/flare/mock/mock.go to match impl Provides struct - Add comp/remote-config/rcclient/types dep to mock/BUILD.bazel - Rename flare import alias to flaredef in comp/systray/systray/impl/systray.go to avoid shadowing the struct field
louis-cqrl
left a comment
There was a problem hiding this comment.
Review phase 4 — after fixup dbdc2e1
Addressed ✅
- 🟠
helpers/BUILD.bazel—go_testtarget added with all four test files (platform-selected for nix/win) plusbuilder_mock.go. - 🟡
systray/impl/systray.go— alias renamedflare→flaredef, both struct fields updated. No more shadow. - 🟠
mock/mock.gostruct —RCListenerfield added, import wired,mock/BUILD.bazeldeps updated. (See new inline — fix is partial.)
Not addressed (unchanged) ❌
- 🔴 Scope creep on
comp/logs-library/utils/ipfilter/(and the rest of the non-flare payload). - 🟡
impl_test.go:62—testRequires/toRequiresshim. - 🟡
agentprofiling.go:361— RC-profiling flare still tagged aslocal.
New finding 🆕 — inline below on mock/mock.go.
Note on
helpers/BUILD.bazel: the newgo_testbuilds the helpers package's OWN tests.builder_mock.go(which has//go:build testand is consumed byimpl_test.goviahelpers.NewFlareBuilderMockWithArgs) is now only visible to helpers' own go_test. Cross-package test consumers (impl_test) will still fail to resolveNewFlareBuilderMockWithArgsunder Bazel — but that's a follow-up Bazel refactor (likely a separatego_librarywithtestonly = True+gotags = ["test"]), not a blocker for this PR.
| Comp flare.Component | ||
| Endpoint api.AgentEndpointProvider | ||
| Comp flaredef.Component | ||
| Endpoint api.AgentEndpointProvider |
There was a problem hiding this comment.
🟠 Incomplete fix — the struct declares RCListener but NewMock() doesn't populate it.
At line 64 NewMock() returns:
return MockProvides{
Comp: m,
Endpoint: api.NewAgentEndpointProvider(m.handlerFunc, "/flare", "POST"),
// RCListener missing → zero value
}Result: fx.Out registers a zero-value TaskListenerProvider into the graph. Best case it's inert; worst case a downstream consumer invokes a method on it and panics. Either way, the mock now advertises a field it doesn't actually supply.
Fix — add to NewMock()'s return:
RCListener: rcclienttypes.NewTaskListener(func(_ rcclienttypes.TaskType, _ rcclienttypes.AgentTaskConfig) (bool, error) {
return false, nil
}),
What does this PR do?
Fully migrates `comp/core/flare` from the V1 component architecture to V2:
Motivation
Align `comp/core/flare` with the V2 component architecture used across the codebase. This is a complete migration — no V1 relics remain.
Describe how you validated your changes
Additional Notes
`FlareSource` was moved from `comp/core/flare/helpers` to `comp/core/flare/types` as part of this migration — `types` is the correct home for types that should be usable without pulling in implementation dependencies.