Datadog: fix names in Span::spawnChild#31366
Conversation
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
adisuissa
left a comment
There was a problem hiding this comment.
drive-by comment:
This seems to be a user-facing change. Please add a runtime guard around this to allow users to migrate this over time.
| @@ -0,0 +1,223 @@ | |||
| /** | |||
| * The tests in this file aren't specific to a class, but instead test all | |||
| * behavior related to spans' "operation name" (a.k.a. "span name"), | |||
There was a problem hiding this comment.
nit: if this is part of span, consider adding to the span_test.cc file (there are a few test files with multiple "Test" classes). This will allow quicker discovery of span relevant tests.
There was a problem hiding this comment.
This test involves both Tracer and Span. I could justify adding it to either of those drivers, but I prefer to leave it separate in order to emphasize the particular behavior under test.
tracer_test.cc and span_test.cc together cover the code paths under test here. I've added this new driver to ensure that the particular overall behavior of "operation name, "resource name," and "service name" is maintained.
|
Thanks for taking a look, @adisuissa.
This is a fix for a bug. I'd prefer that users automatically got the fix when upgrading, as opposed to having to enable some option to get the behavior they had before the bug was introduced. |
|
/backport |
Yeah, having it runtime guarded doesn't imply it will be false-by-default. It should be true-by-default, but allows Envoy users to disable it (hopefully for a short duration) while making any changes needed in the rest of their system. |
|
IMO, this is just a bug-fix and we don't have users relying on or desiring the behavior without the fix. The other changes that were referenced in the PR description had a more-direct impact on users, and this PR is essentially just a final cleanup of previous fixes. But it's good to be reminded about that, as we have another change being developed where a runtime guard (enabled by default, but has an option to disable) would be a perfect fit! |
|
Hello everyone, developers at Wayfair actively use DataDog tracing for Istio-enabled workloads, and this functionality is currently partially broken on Istio v1.17 and newer. I understand it's the holiday season, but we would be grateful if you could help merge this change. |
The runtime guard can (and should) be enabled by default. It just allows safer migration across a fleet if one detects a problem with the new code until it is completely rolled out. |
adisuissa
left a comment
There was a problem hiding this comment.
LGTM, thanks for fixing this!
Seems that the extension owners don't think this needs to be runtime guarded.
/assign-from @envoyproxy/senior-maintainers
|
@envoyproxy/senior-maintainers assignee is @ggreenway |
|
Thanks again, @adisuissa. My teammates and I will look into runtime guards for future changes, knowing now that it's an option. |
The Bug and Its Fix
This is an additional fix for the bug described in #29932.
There are still situations in which the spans produced by the Datadog tracer have the wrong name (operation name). We want all Datadog spans to be named "envoy.proxy", but sometimes they have another value which should instead be the span's resource name.
#29932 addressed spans created by
class Tracer, #30503 (which was about something else) addressed name changes due toSpan::setOperation, and this pull request that you are reading addresses spans created bySpan::spawnChild.For a little background, there was a substantial rewrite of the Datadog tracing extension within Envoy in October 2022. Since then, the modified code has slowly worked its way into Envoy releases, and then into Istio releases. One difference between the old code and the new is that the old code was OpenTracing-based, while the new code is not. While making the changes, I failed to notice that the "operation name" that Envoy uses is not the same notion as that used in Datadog. This pull request and those linked above are my identifying all of the places where that confusion of terms appears.
This pull request changes the last inappropriate setting of a Datadog span's operation name, and adds dedicated unit tests to centralize and clarify the expected behavior.
Please tag this pull request for backport review. It fixes a bug that affects the
v1.27andv1.28release branches. I'm happy to propose those backports once this pull request is reviewed.Testing
See the new
naming_test.ccand the modifiedspan_test.cc.Other Info
Commit Message: "Datadog: fix span name in Span::spawnChild"
Additional Description: see above
Risk Level: low
Testing: see included unit tests
Docs Changes: N/A
Release Notes: see included changes to
changelogs/current.yamlPlatform Specific Features: N/A