[SVLS-8160] Add serverless-identifying tag to Azure App Service Windows profiles#45281
Conversation
- Add apm_config.additional_profile_tags configuration - Bind to DD_APM_ADDITIONAL_PROFILE_TAGS env var - Add tests for YAML and env var configuration - Update config template documentation
Static quality checks✅ Please find below the results from static quality gates Successful checksInfo
25 successful checks with minimal change (< 2 KiB)
On-wire sizes (compressed)
|
759fe49 to
dcc9ad9
Compare
| config.BindEnvAndSetDefault("apm_config.additional_profile_tags", map[string]string{}, "DD_APM_ADDITIONAL_PROFILE_TAGS") | ||
| config.BindEnv("apm_config.additional_endpoints", "DD_APM_ADDITIONAL_ENDPOINTS") //nolint:forbidigo // TODO: replace by 'SetDefaultAndBindEnv' | ||
| config.BindEnv("apm_config.replace_tags", "DD_APM_REPLACE_TAGS") //nolint:forbidigo // TODO: replace by 'SetDefaultAndBindEnv' | ||
| config.BindEnv("apm_config.analyzed_spans", "DD_APM_ANALYZED_SPANS") //nolint:forbidigo // TODO: replace by 'SetDefaultAndBindEnv' | ||
| config.BindEnv("apm_config.ignore_resources", "DD_APM_IGNORE_RESOURCES", "DD_IGNORE_RESOURCE") //nolint:forbidigo // TODO: replace by 'SetDefaultAndBindEnv' | ||
| config.BindEnv("apm_config.instrumentation.targets", "DD_APM_INSTRUMENTATION_TARGETS") //nolint:forbidigo // TODO: replace by 'SetDefaultAndBindEnv' |
There was a problem hiding this comment.
Would it be possible to have all of these be BindEnvAndSetDefault?
There was a problem hiding this comment.
These configs were already here so I'm not sure if I should change them / what the defaults would be!
| # | ||
| # # @param additional_profile_tags - map of strings - optional | ||
| # # @env DD_APM_ADDITIONAL_PROFILE_TAGS - JSON string - optional | ||
| # # Additional tags to add to all profiles. These tags are added on the agent side |
There was a problem hiding this comment.
Not sure what this means, but if my suggestion is incorrect, ignore!
| # # Additional tags to add to all profiles. These tags are added on the agent side | |
| # # Additional tags to add to all profiles. These tags are added by the Agent |
| # # tags that should be applied to all profiles (e.g., origin). | ||
| # # | ||
| # # Note: For Azure App Service in Windows, this configuration is set in the AAS Site Extension. | ||
| # # Overriding the default tags in AAS Windows should be done with reference to https://datadoghq.atlassian.net/wiki/spaces/SLS/pages/6007685143/Profiling. |
There was a problem hiding this comment.
Do we reference internal confluence in public repos?
| # # Overriding the default tags in AAS Windows should be done with reference to https://datadoghq.atlassian.net/wiki/spaces/SLS/pages/6007685143/Profiling. | |
| # # To override the default tags in AAS Windows see https://datadoghq.atlassian.net/wiki/spaces/SLS/pages/6007685143/Profiling. |
There was a problem hiding this comment.
It looks like we have some examples of referencing internal confluence in this repo (example)
See this discussion on the reverted PR for context - overriding these profile tags wouldn't be normal user behavior so I don't think it would make sense to have it in public docs
There was a problem hiding this comment.
I think "To override the default tags... see" would make this sound more like regular user behavior. I want to convey that this would be a rare customer workflow
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 40c9ac0 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | +1.67 | [-1.41, +4.75] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | quality_gate_metrics_logs | memory utilization | +2.38 | [+2.17, +2.59] | 1 | Logs bounds checks dashboard |
| ➖ | docker_containers_cpu | % cpu utilization | +1.67 | [-1.41, +4.75] | 1 | Logs |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | +1.28 | [+1.18, +1.38] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | +0.97 | [+0.77, +1.16] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | +0.73 | [+0.52, +0.94] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.65 | [+0.61, +0.69] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_logs | % cpu utilization | +0.59 | [-0.91, +2.09] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | +0.36 | [+0.20, +0.51] | 1 | Logs |
| ➖ | file_tree | memory utilization | +0.03 | [-0.02, +0.09] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | +0.03 | [-0.01, +0.08] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.03 | [-0.48, +0.54] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | +0.02 | [-0.09, +0.13] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | +0.01 | [-0.39, +0.40] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | +0.00 | [-0.40, +0.40] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | +0.00 | [-0.12, +0.13] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.01 | [-0.10, +0.09] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | -0.01 | [-0.12, +0.11] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | -0.04 | [-0.11, +0.02] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | -0.13 | [-0.29, +0.02] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | -0.26 | [-0.50, -0.03] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | -0.28 | [-0.33, -0.23] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | -0.32 | [-0.37, -0.28] | 1 | Logs bounds checks dashboard |
| ➖ | docker_containers_memory | memory utilization | -0.50 | [-0.58, -0.43] | 1 | Logs |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | links |
|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | lost_bytes | 10/10 | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | lost_bytes | 10/10 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
apiarian-datadog
left a comment
There was a problem hiding this comment.
approved as long as we have a link to a passing test that was previously failing
|
/trigger-ci --variable RUN_ALL_BUILDS=true --variable RUN_KITCHEN_TESTS=true --variable RUN_E2E_TESTS=on --variable RUN_UNIT_TESTS=on --variable RUN_KMT_TESTS=on |
|
View all feedbacks in Devflow UI.
Started pipeline #92314628 |
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
|
Note: This is a reimplementation of this PR, which was reverted here. This fixes the failing CI test.
Working CI tests: Job 1, Job 2
What does this PR do?
_dd.origin:appservicetag to profiles for Azure App Service Windows Appsapm_config.additional_profile_tags, which is set in this PR in the AAS Site ExtensionAdditionalProfileTagsfor clearer future useThe only changes between this PR and the reverted PR is commit ef27a8a, which fixes the CI test (failed job #1, failed job #2), and dcc9ad9, which renames
azureServerlessTagsto be cloud-agnostic as_dd.origingets set for profiles in Google Cloud Run Services, Jobs, and Functions, not just Azure products.Motivation
See related PR
Describe how you validated your changes
Deployed Node.js and .NET Azure App Service Windows Apps with the Site Extension with
DD_PROFILING_ENABLED=trueand looked at network request to validate that_dd.originis set correctlydatadog-trace-agent.exeanddogstatsd.exefrom thepackage_build > windows_zip_agent_binaries_x64-a7jobdatadog-trace-agent.exeanddogstatsd.exewith the dev versiondatadog.yamlfrom the related AAS Extension PR over_dd.originin the profile tags!_dd.origintag gets set in profiles