datadog: fix span error property#31669
Merged
mattklein123 merged 5 commits intoJan 8, 2024
Merged
Conversation
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Member
|
/backport |
mattklein123
approved these changes
Jan 8, 2024
Contributor
Author
|
Thanks, Matt. I'll work on the backports for |
Contributor
Author
|
Ah, I forgot to update the change log. Will do that in the backports, anyway. |
Member
|
@dgoffredo could you add the changelog to |
dgoffredo
added a commit
to DataDog/envoy
that referenced
this pull request
Jan 8, 2024
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
phlax
pushed a commit
that referenced
this pull request
Jan 8, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Problem and Its Fix
Datadog customers have found another bug introduced by November 2022's rewrite of the Datadog tracing extension.
The Problem
Previously, when Envoy marked a span as containing an error by setting the Envoy::Tracing::TracingTagValues::Error tag ("error") to Envoy::Tracing::TracingTagValues::True ("true"), the underlying OpenTracing-based Datadog span would set its internal private "error" boolean property to
true.After the aforementioned rewrite, that span property does not get set when the "error" tag is set to "true". This is a bug.
The bug is not immediately obvious, because most errors on spans take the form of a non-success HTTP or gRPC response status code, and that information is still available in other tags. The issue reported by a Datadog customer was that error-containing traces were no longer appearing in the Datadog UI after upgrading Istio to a version that contains an Envoy version that includes the rewrite. The reason for the missing traces is that the omitted "error=
true" internal span property caused the Datadog Agent to drop associated traces more often, since non-error-containing traces are not as interesting as error-containing traces.The Fix
These changes do two things:
true. Ignore all other values for the "error" tag (e.g. "false", "foobar", ""). Never actually set a tag named "error" on the underlying span.bool errorproperty totrue.(1) is what is needed to restore the behavior from before the rewrite. I chose not to handle non-"true" values for the "error" tag, because as far as I can see looking through the Envoy source, it is only ever set to "true".
(2) is an addition that I saw was possible while I was reading the Envoy code for this bug. Envoy uses the tag
{"error": "true"}to indicate an error, but also sometimes uses the tag{"error.reason": string}to annotate the error. This "error.reason" concept most closely maps to Datadog's "error.message" tag, per Datadog's span tag semantics.Testing
See the included modifications to
span_test.cc.I also produced a build of Envoy containing these changes, and verified that when an upstream returns a 5xx response code, the underlying Datadog span reported to the Datadog Agent has
error: true. When an upstream returns a success response code, the resulting span haserror: false.Other Info
Please label this pull request for backport review. It fixes a bug that affects the
v1.27andv1.28release branches. I'm happy to propose those backports once this pull request is reviewed.