[system_tests] Fix race conditions in set_and_wait_rc by using unique config IDs by vlad-scherbich · Pull Request #6371 · DataDog/system-tests

vlad-scherbich · 2026-02-23T20:31:37Z

https://datadoghq.atlassian.net/browse/PROF-13796

Motivation

~25 of 32 system test failures on dd-trace-py main in 2026 stem from dynamic configuration tests. Root cause: a race in set_and_wait_rc—it can match stale RC ACKs from a previous config update and return before the new config is actually applied.

Tests fixed

All tests in test_dynamic_configuration.py that use set_and_wait_rc. Top flaky tests by frequency (dd-trace-py main, 2026):

Rank	Hits	Test
1	6	`test_remote_sampling_rules_retention`
2	6	`test_trace_sampling_rate_override_default`
3	5	`test_capability_tracing_sample_rules`
4	3	`test_trace_sampling_rules_override_env`

Others: test_apply_state, test_trace_sampling_rate_override_env, test_trace_sampling_rate_with_sampling_rules, test_log_injection_enabled, test_tracing_client_tracing_tags, test_trace_sampling_rules_override_rate, test_trace_sampling_rules_with_tags.

Changes

_set_rc: Use uuid.uuid4() for config_id when not passed—avoids repeating IDs for identical payloads (hash would recreate the stale-ACK race). Return the config_id used.
set_and_wait_rc: When config_id is passed (reuse case), clear the agent before set_rc to discard buffered RC requests so we only see responses from our update. Use config_id filtering in wait_for_rc_apply_state so we only match ACKs for the config we just set.
wait_for_rc_apply_state (_test_agent.py): Add optional config_id parameter; when set, only match config_states whose id equals it. Use str() on both sides for robust int/str comparison.
test_capability_tracing_sample_rules: Use wait_loops=_RC_WAIT_LOOPS (400, ~4s) so the library has enough time to send its first RC request.

Reviewer checklist

Anything but tests/ or manifests/ is modified ? I have the approval from R&P team
A docker base image is modified?
- the relevant build-XXX-image label is present
A scenario is added, removed or renamed?
- Get a review from R&P team

github-actions · 2026-02-23T20:32:09Z

CODEOWNERS have been resolved as:

tests/parametric/test_dynamic_configuration.py                          @DataDog/system-tests-core @DataDog/apm-sdk-capabilities
utils/docker_fixtures/_test_agent.py                                    @DataDog/system-tests-core

datadog-official · 2026-02-23T20:39:55Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 236b7e9 | Docs | Datadog PR Page | Was this helpful? React with 👍/👎 or give us feedback!}

vlad-scherbich · 2026-02-24T21:05:57Z

@cbeauchesne , second attempt to generalize the fix for #6342

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 057ab1dc44

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

KowalskiThomas · 2026-02-25T09:16:36Z

+    on RC update, so we skip the telemetry wait and use config_id filtering to avoid stale ACKs.
    """
    rc_config = _create_rc_config(config_overrides)
+    resolved_config_id = config_id or str(hash(json.dumps(rc_config)))


Does json.dumps provide deterministic ordering of fields if you don't pass sort_fields or whatever the flag is? I think it could mean the resolved_config_id is not deterministic either (and I'm not sure whether it matters in this context)

Does json.dumps provide deterministic ordering of fields

No - you would need to do this: json.dumps(data, sort_keys=True)

This ... is an excellent observation! However, it also OLD code, so I don't know if changing this behavior here is desired. If anything, it should be done separately - @cbeauchesne for your thoughts on this?

Actually, I'm going to go with this fix right now, as it might help with unflaking all runtimes.

KowalskiThomas · 2026-02-25T09:17:41Z

+    for _ in range(_MAX_RC_EVENT_WAIT_LOOPS):
+        if test_agent.count_telemetry_events("app-client-configuration-change") > pre_count:
+            break
+        time.sleep(0.01)
+    else:


for / else really is something I'll never be able to wrap my head around, but good job using it here 😅

KowalskiThomas · 2026-02-25T09:18:16Z

+                    if message.get("request_type") == event_name:
+                        if message.get("application", {}).get("language_version") != "SIDECAR":
+                            count += 1


Could we do guard-style here? like if not: continue instead of if: if: if: do()?

I like this, another way could be to just chain all the ANDs ... taking a look.

@KowalskiThomas I made it pretty :)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8e025751e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…#6342)" This reverts commit 5d9142e.

vlad-scherbich mentioned this pull request Feb 23, 2026

Fix set_and_wait_rc helper by clearing out stale events before new co… #6349

Closed

5 tasks

vlad-scherbich force-pushed the vlad/fix-dynamic-config-flakes branch 2 times, most recently from 1efc4da to a8cf21a Compare February 23, 2026 21:10

vlad-scherbich changed the title ~~Attempt to fix race conditions in 'set_and_wait_rc' by counting telem…~~ [system_tests] Fix race conditions in 'set_and_wait_rc' by counting telemetry events Feb 24, 2026

vlad-scherbich force-pushed the vlad/fix-dynamic-config-flakes branch 3 times, most recently from 0fff71e to 7885079 Compare February 24, 2026 17:14

vlad-scherbich mentioned this pull request Feb 24, 2026

[system_tests] Fix flaky IAST standalone tests via library_interface_timeout #6377

Draft

5 tasks

vlad-scherbich requested review from KowalskiThomas and taegyunkim February 24, 2026 19:45

vlad-scherbich marked this pull request as ready for review February 24, 2026 21:04

vlad-scherbich requested review from a team as code owners February 24, 2026 21:04

vlad-scherbich requested review from mtoffl01 and removed request for a team February 24, 2026 21:04

chatgpt-codex-connector Bot reviewed Feb 24, 2026

View reviewed changes

Comment thread tests/parametric/test_dynamic_configuration.py Outdated

vlad-scherbich marked this pull request as draft February 24, 2026 22:54

KowalskiThomas reviewed Feb 25, 2026

View reviewed changes

vlad-scherbich force-pushed the vlad/fix-dynamic-config-flakes branch 3 times, most recently from 5cba844 to 5cf7369 Compare February 25, 2026 18:53

vlad-scherbich changed the title ~~[system_tests] Fix race conditions in 'set_and_wait_rc' by counting telemetry events~~ [system_tests] Fix a race condition in 'set_and_wait_rc' by waiting on new RC update take effect Feb 25, 2026

vlad-scherbich marked this pull request as ready for review February 25, 2026 21:18

vlad-scherbich enabled auto-merge (squash) February 25, 2026 21:18

vlad-scherbich requested a review from KowalskiThomas February 25, 2026 21:18

chatgpt-codex-connector Bot reviewed Feb 25, 2026

View reviewed changes

Comment thread tests/parametric/test_dynamic_configuration.py Outdated

vlad-scherbich changed the title ~~[system_tests] Fix a race condition in 'set_and_wait_rc' by waiting on new RC update take effect~~ [system_tests] Fix race conditions in set_and_wait_rc by using unique config IDs Feb 26, 2026

vlad-scherbich added 9 commits February 26, 2026 09:24

Fix race conditions in 'set_and_wait_rc' by counting telemetry events

21bae92

Revert "unflake a System Test: 'test_remote_sampling_rules_retention' (…

a258fe8

…#6342)" This reverts commit 5d9142e.

re-import time

bcec220

clear agent for slow runtimes

b22a968

filter ACKs by config ID when available, instead of clearing session

68834fb

use 'config_id' approach for all runtimes, not just 'slow' ones

22cb3bc

unify approach, take 2

333b858

give the tracer time to apply the new rules after the ACK

e852fb2

replace JSON hash with uuid's

236b7e9

vlad-scherbich force-pushed the vlad/fix-dynamic-config-flakes branch from f8e0257 to 236b7e9 Compare February 26, 2026 15:15

cbeauchesne approved these changes Feb 26, 2026

View reviewed changes

vlad-scherbich merged commit 7932d7c into main Feb 26, 2026
446 checks passed

vlad-scherbich deleted the vlad/fix-dynamic-config-flakes branch February 26, 2026 16:17

Conversation

vlad-scherbich commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tests fixed

Changes

Reviewer checklist

Uh oh!

github-actions Bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-official Bot commented Feb 23, 2026 • edited by datadog-datadog-prod-us1 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vlad-scherbich commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

KowalskiThomas Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

vlad-scherbich Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vlad-scherbich Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

KowalskiThomas Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

KowalskiThomas Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

vlad-scherbich Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

vlad-scherbich Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vlad-scherbich commented Feb 23, 2026 •

edited

Loading

github-actions Bot commented Feb 23, 2026 •

edited

Loading

datadog-official Bot commented Feb 23, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading

vlad-scherbich commented Feb 24, 2026 •

edited

Loading

vlad-scherbich Feb 25, 2026 •

edited

Loading