Skip to content

datadog: fix span error property#31669

Merged
mattklein123 merged 5 commits into
envoyproxy:mainfrom
DataDog:david.goffredo/datadog-fix-span-error-property
Jan 8, 2024
Merged

datadog: fix span error property#31669
mattklein123 merged 5 commits into
envoyproxy:mainfrom
DataDog:david.goffredo/datadog-fix-span-error-property

Conversation

@dgoffredo
Copy link
Copy Markdown
Contributor

The Problem and Its Fix

Datadog customers have found another bug introduced by November 2022's rewrite of the Datadog tracing extension.

The Problem

Previously, when Envoy marked a span as containing an error by setting the Envoy::Tracing::TracingTagValues::Error tag ("error") to Envoy::Tracing::TracingTagValues::True ("true"), the underlying OpenTracing-based Datadog span would set its internal private "error" boolean property to true.

After the aforementioned rewrite, that span property does not get set when the "error" tag is set to "true". This is a bug.

The bug is not immediately obvious, because most errors on spans take the form of a non-success HTTP or gRPC response status code, and that information is still available in other tags. The issue reported by a Datadog customer was that error-containing traces were no longer appearing in the Datadog UI after upgrading Istio to a version that contains an Envoy version that includes the rewrite. The reason for the missing traces is that the omitted "error=true" internal span property caused the Datadog Agent to drop associated traces more often, since non-error-containing traces are not as interesting as error-containing traces.

The Fix

These changes do two things:

  1. When the "error" tag is set to "true", instead set the Datadog's underlying bool error property to true. Ignore all other values for the "error" tag (e.g. "false", "foobar", ""). Never actually set a tag named "error" on the underlying span.
  2. When the "error.reason" tag is set, set it and the "error.message" tag on the Datadog span, and set the underlying bool error property to true.

(1) is what is needed to restore the behavior from before the rewrite. I chose not to handle non-"true" values for the "error" tag, because as far as I can see looking through the Envoy source, it is only ever set to "true".

(2) is an addition that I saw was possible while I was reading the Envoy code for this bug. Envoy uses the tag {"error": "true"} to indicate an error, but also sometimes uses the tag {"error.reason": string} to annotate the error. This "error.reason" concept most closely maps to Datadog's "error.message" tag, per Datadog's span tag semantics.

Testing

See the included modifications to span_test.cc.

I also produced a build of Envoy containing these changes, and verified that when an upstream returns a 5xx response code, the underlying Datadog span reported to the Datadog Agent has error: true. When an upstream returns a success response code, the resulting span has error: false.

Other Info

Please label this pull request for backport review. It fixes a bug that affects the v1.27 and v1.28 release branches. I'm happy to propose those backports once this pull request is reviewed.

Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
@phlax
Copy link
Copy Markdown
Member

phlax commented Jan 8, 2024

/backport

@repokitteh-read-only repokitteh-read-only Bot added the backport/review Request to backport to stable releases label Jan 8, 2024
@phlax phlax added this to the v1.29.0 milestone Jan 8, 2024
@mattklein123 mattklein123 merged commit 51b0a93 into envoyproxy:main Jan 8, 2024
@dgoffredo
Copy link
Copy Markdown
Contributor Author

Thanks, Matt. I'll work on the backports for v1.27 and v1.28.

@dgoffredo
Copy link
Copy Markdown
Contributor Author

Ah, I forgot to update the change log. Will do that in the backports, anyway.

@phlax
Copy link
Copy Markdown
Member

phlax commented Jan 8, 2024

@dgoffredo could you add the changelog to main first please

dgoffredo added a commit to DataDog/envoy that referenced this pull request Jan 8, 2024
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
phlax pushed a commit that referenced this pull request Jan 8, 2024
Signed-off-by: David Goffredo <david.goffredo@datadoghq.com>
@phlax phlax removed the backport/review Request to backport to stable releases label Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants