tracing: add datadog extension#4699
Conversation
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
3decae8 to
2b5c3e6
Compare
|
Sorry, can you do a master merge to pick up #4701? |
|
@cgilmour unfortunately I'm out for the next 2 weeks, and I'm not going to have a chance to look at this before I go. I'm going to mark this as "no stalebot" in the interim. If anyone else wants to take a look including @rnburn and/or @objectiser that would be great. |
| // The cluster to use for submitting traces to the Datadog agent. | ||
| string collector_cluster = 1 [(validate.rules).string.min_bytes = 1]; | ||
| string service_name = 2 [(validate.rules).string.min_bytes = 1]; | ||
| bool priority_sampling = 3; |
There was a problem hiding this comment.
What's the priority_sampling option?
Also, does datadog's tracer support the sampling.priority tag? Envoy uses it to disable sampling for some of the traffic that would be noise.
There was a problem hiding this comment.
@rnburn Support for that tag is being added in DataDog/dd-opentracing-cpp#59 should be ready today.
@cgilmour We should copy/have similar comments as https://github.com/DataDog/dd-opentracing-cpp/blob/master/src/tracer_factory.cpp#L21 We should also default priority sampling to on here (whereas normally it's off by default), since envoy uses it.
| message->headers().insertPath().value(encoder_->path()); | ||
| message->headers().insertHost().value(driver_.cluster()->name()); | ||
| for (auto& h : encoder_->headers()) { | ||
| ENVOY_LOG(debug, "Adding header {}: {}", h.first, h.second); |
There was a problem hiding this comment.
Is there enough context in this log statement for it to be useful?
| cluster_ = cluster->info(); | ||
|
|
||
| tracer_options_.operation_name_override = "envoy.proxy"; | ||
| if (datadog_config.service_name().size() > 0) { |
There was a problem hiding this comment.
!datadog_config.service_name().empty()?
| Upstream::ClusterInfoConstSharedPtr cluster() { return cluster_; } | ||
| Runtime::Loader& runtime() { return runtime_; } | ||
| DatadogTracerStats& tracerStats() { return tracer_stats_; } | ||
| const datadog::opentracing::TracerOptions& tracerOptions() { return tracer_options_; } |
There was a problem hiding this comment.
I think there's an Envoy convention for putting non-overridden methods like this before the overridden ones.
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
9a27059 to
569bbc2
Compare
|
I'll sort out an update to fix the failing tests. |
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for working on this and sorry for the delay. Some comments to get started.
| google.protobuf.Struct config = 2; | ||
| } | ||
|
|
||
| // Configuration for the Datadog tracer. |
There was a problem hiding this comment.
Can we add a release note for this? Are there any other docs that need updating? Perhaps https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/tracing?
There was a problem hiding this comment.
Happy to do this once the main code changes are acceptable.
|
|
||
| class TraceReporter; | ||
| typedef std::unique_ptr<TraceReporter> TraceReporterPtr; | ||
| typedef std::shared_ptr<datadog::opentracing::TraceEncoder> TraceEncoderPtr; |
There was a problem hiding this comment.
nit: TraceEncoderSharedPtr
| POOL_COUNTER_PREFIX(stats, "tracing.datadog."))}, | ||
| tls_(tls.allocateSlot()), runtime_(runtime) { | ||
|
|
||
| Upstream::ThreadLocalCluster* cluster = cm_.get(datadog_config.collector_cluster()); |
There was a problem hiding this comment.
You can use Config::Utility::checkCluster here I think.
There was a problem hiding this comment.
Yup, it was done this way for consistency (and not knowing about checkCluster).
Would you like the other tracers updated as well?
There was a problem hiding this comment.
Done for all three cases.
|
|
||
| void TraceReporter::enableTimer() { | ||
| const uint64_t flush_interval = | ||
| driver_.runtime().snapshot().getInteger("tracing.datadog.flush_interval_ms", 1000U); |
There was a problem hiding this comment.
Please make sure these runtime keys are documented somewhere. Don't recall of the top of my head where the other keys you based this on are documented.
There was a problem hiding this comment.
It was "inspired" by settings in other tracers, but can probably be replaced with a sensible default.
The impact of it is on the tracer mechanics more than the user.
There was a problem hiding this comment.
Yup that's fine, you don't need to have things be configurable if you don't want.
There was a problem hiding this comment.
Replaced with a sensible default.
| flush_timer_->enableTimer(std::chrono::milliseconds(flush_interval)); | ||
| } | ||
|
|
||
| void TraceReporter::flushTraces() { |
There was a problem hiding this comment.
I'm guessing that almost all this code was basically copied from the LS implementation? How much of it is different? Is it worth sharing the code in any way? cc @rnburn
There was a problem hiding this comment.
Correct, they were used as a reference, and where possible, used verbatim so that the result would be consistent.
If there's room to refactor things, that's great.
Flush timers and flush mechanisms are probably a good case for that.
Not sure about other bits.
There was a problem hiding this comment.
It's up to you but any de-dup would be appreciated. This is going to come up again when someone comes in and adds the OpenCensus tracer.
There was a problem hiding this comment.
Not done, but OK doing a followup change to de-dup things, and any suggestions from @rnburn
|
|
||
| Http::MessagePtr message(new Http::RequestMessageImpl()); | ||
| message->headers().insertMethod().value().setReference(Http::Headers::get().MethodValues.Post); | ||
| message->headers().insertPath().value(encoder_->path()); |
There was a problem hiding this comment.
This line and next line can be references I think, FWIW
There was a problem hiding this comment.
OK. I'll see I can make this one return a reference. The next one is internal to envoy so easier to update.
| message->headers().insertMethod().value().setReference(Http::Headers::get().MethodValues.Post); | ||
| message->headers().insertPath().value(encoder_->path()); | ||
| message->headers().insertHost().value(driver_.cluster()->name()); | ||
| for (auto& h : encoder_->headers()) { |
There was a problem hiding this comment.
for perf reasons, it would be better to pre-construct the lower case headers. Then you can just send them by reference.
There was a problem hiding this comment.
Alright, I'll see what I can do for that.
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
|
Thanks @mattklein123, added some responses to the comments. |
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
mattklein123
left a comment
There was a problem hiding this comment.
This in general LGTM. Can we do a docs/release notes pass and then I can take a final pass? Thank you!
Signed-off-by: Caleb Gilmour <caleb.gilmour@datadoghq.com>
|
Marking is waiting. /wait |
|
CODEOWNERS entry is missing, @cgilmour @mattklein123? |
|
Ok. I can submit a followup PR for that. |
|
Thank you @cgilmour! |
Description:
This PR adds a tracer extension so envoy can produce HTTP traces and submit them to Datadog via an agent.
It has similar behavior to the existing LightStep and Zipkin tracers, and their implementations were used as a reference.
Risk Level:
Low.
Testing:
Unit tests ("borrowed" from Lightstep)
Integration tests using docker-compose. (An example is not submitted yet, but can be provided)
End-to-End tests from browser HTTP request to traces appearing in Datadog's system, internally and by external users in a staging environment.
Docs Changes:
Please suggest where doc changes need to be made.
Release Notes:
Please suggest about this also.
Fixes #3861 (already closed)
CC: @mattklein123 (offered to sponsor this submission) and other tracing experts @rnburn and @objectiser