Enable Zipkin tracing#905
Conversation
… master. Make Zipkin Tracer use Envoy's own random-number generator (envoyproxy#430)
|
@moderation Fixed! |
| * annotation then this Envoy instance initiated the span. Only the Envoy that initiates a Zipkin | ||
| * span should add binary annotations to it; otherwise, duplicate binary annotations | ||
| * would be created. | ||
|
|
| http_tracer_.reset( | ||
| new Tracing::HttpTracerImpl(std::move(lightstep_driver), server_.localInfo())); | ||
| } else if (type == "zipkin") { | ||
| if (server_.localInfo().clusterName().empty()) { |
There was a problem hiding this comment.
nit: remove else statement and add ASSERT checking for type is zipkin. ex https://github.com/lyft/envoy/blob/master/source/common/router/router_ratelimit.cc#L102
| "properties" : { | ||
| "type" : { | ||
| "type" : "string", | ||
| "enum" : ["zipkin"] |
There was a problem hiding this comment.
I might have missed some prior discussion, but my reading of this enum is that tracing drivers were originally envisioned to all be specified under a single "driver" schema, with a type ("zipkin", "lightstep") specified via this field.
Most of the other fields are identical across "lightstep_driver" and "zipkin_driver."
There was a problem hiding this comment.
My original code had both "lightstep" and "zipkin" as valid values of the same enum, and I had made access_token_file an optional parameter, as it has no use for Zipkin. However, under PR #700 (the old PR replaced by this one), @RomanDzhabarov suggested the following: "this should be done in a way that there are separate schemas for different driver types, otherwise schema will not have much value."
|
@goaway I addressed two of your comments above. Please, see my reply to your comment on driver schema. |
…ctor_endpoint optional. (envoyproxy#430)
|
@mattklein123 Please, see my new commits addressing your comments. |
mattklein123
left a comment
There was a problem hiding this comment.
looks good, a few small things.
|
|
||
| void Annotation::changeEndpointServiceName(const std::string& service_name) { | ||
| if (endpoint_.valid()) { | ||
| Endpoint endpoint(endpoint_.value()); |
There was a problem hiding this comment.
You can avoid copying here by just adding a non-const value() function here: https://github.com/lyft/envoy/blob/master/include/envoy/common/optional.h#L37
The method signature should be: T& value() and the body will be the same as the other one.
There was a problem hiding this comment.
Sorry just to clarify, when you add the new non-const version, have the const version call the non-const version to avoid code duplication. :)
There was a problem hiding this comment.
I cannot have one call the other without using const_cast. Are you okay with this use of const_cast?
There was a problem hiding this comment.
The const version can call the non-cost version. Basically never use const_cast.
There was a problem hiding this comment.
That seems to be intuitive, indeed. That's what I did.
The problem here is different, though. Irrespective of whether or not const_cast is used, the compiler seems to detect an infinite recursion. Perhaps because of optimization? (I am compiling the code with the CI docker image, the one I pulled today).
So, the compiler is not happy with this:
T& value() {
if (!valid_) {
throw EnvoyException("fetching invalid Optional value");
}
return value_;
}
const T& value() const { return value(); }
There was a problem hiding this comment.
I see, sorry, yes. I'm not sure off the top of my head the right way to deal with this. I would just copy the code and then make sure you have test coverage for both variants in optional_test (just declare a const Optional for testing).
There was a problem hiding this comment.
ok. I will duplicate the code. I believe avoiding code duplication requires an ugly combination of const_cast and static_cast...
| return false; | ||
| } | ||
|
|
||
| ZipkinDriver::TlsZipkinTracer::TlsZipkinTracer(TracerPtr tracer, ZipkinDriver& driver) |
There was a problem hiding this comment.
nit: for clarify making this take TracerPtr&& which will make it clear at the call site that you are moving.
|
|
||
| namespace Zipkin { | ||
|
|
||
| ZipkinSpan::ZipkinSpan(Zipkin::Span& span) : span_(span) {} |
There was a problem hiding this comment.
nit: it would be optimal if you remove Zipkin from all of the class names now, since you are inside the Zipkin namespace. You can veto this but I would appreciate it. 😉
There was a problem hiding this comment.
I only left Zipkin:: in the ZipkinSpan class. It turns out that, because ZipkinSpan inherits from Tracing::Span, any reference to Span is considered to be a reference to Tracing::Span and not to Zipkin::Span. I explicitly refer to Zipkin::Span to disambiguate. I was surprised by this behavior. Any suggestion?
| Tracer tracer("my_service_name", addr); | ||
| MockMonotonicTimeSource mock_start_time; | ||
| MonotonicTime start_time = mock_start_time.currentTime(); | ||
| Runtime::RandomGeneratorImpl random_generator; |
There was a problem hiding this comment.
I can certainly do that; no problem. The reason I used the real RandomGenerator in some cases was that MockRandomGenerator::random() always returns 0; therefore it renders some of my checks useless. It would be nice if we could control the value to be returned by MockRandomGenerator::random() to some extent, e.g., zero or non-zero.
There was a problem hiding this comment.
You can. It's a mock, have it return whatever you want in your tests.
There was a problem hiding this comment.
Yes, please remove commented out code. The point of the mock is so that you can return whatever value want, and can exercise all of your code paths completely and reproducibly. So make sure you have full coverage when using all mocks, and remove all commented out checks.
|
|
||
| Span span2(std::move(span)); | ||
|
|
||
| EXPECT_EQ(id, span2.id()); |
There was a problem hiding this comment.
If you are going to bother writing a move constructor test, you should sanity check that span1 has actually been moved from, and a copy was not done.
There was a problem hiding this comment.
According to the test coverage reports with and without this test, the code above is indeed exercising the move constructor.
Given that a "move" in the pure sense of the word does not happen when std::move is invoked, I don't think there is a safe/good way to determine whether or not an object was moved from. One (ugly) approach to automatically verify if the move constructor was actually called would be to change the class implementation to set a member flag to a value in the case of a copy constructor execution, and to another value when the move constructor is exercised. I don't think we want that. Any ideas?
There was a problem hiding this comment.
You can check span1 to see if it still contains data members. The move should literally move them, so string should be empty, etc.
There was a problem hiding this comment.
I don't think that is the case. My point is: there is no move in the literal sense. I believe what actually happens is some sort of "optimized copy". In fact, the data members of the original span seem to be intact.
There was a problem hiding this comment.
Then you aren't actually moving anything. In the interest of moving this forward, I would kill the move constructor and just do the copy, and deal with the perf issues in a follow up.
|
|
||
| void setupValidDriver() { | ||
| EXPECT_CALL(cm_, get("fake_cluster")).WillRepeatedly(Return(&cm_.thread_local_cluster_)); | ||
| ON_CALL(*cm_.thread_local_cluster_.cluster_.info_, features()) |
There was a problem hiding this comment.
IIRC this is the default so this line can be removed.
| NiceMock<Runtime::MockLoader> runtime_; | ||
| NiceMock<ThreadLocal::MockInstance> tls_; | ||
| NiceMock<LocalInfo::MockLocalInfo> local_info_; | ||
| Runtime::RandomGeneratorImpl random_; |
There was a problem hiding this comment.
(Just search all the test code for non-mock versions to make sure you get them all). Thanks.
|
@mattklein123 Please, see my latest commit and my replies to your comments. Do you have any suggestions based on all my comments above? |
|
@fabolive yes, I responded. |
| void Annotation::changeEndpointServiceName(const std::string& service_name) { | ||
| if (endpoint_.valid()) { | ||
| Endpoint endpoint(endpoint_.value()); | ||
| Endpoint& endpoint = endpoint_.value(); |
There was a problem hiding this comment.
nit: delete this line and just do endpoint_.value().setServiceName(service_name)
|
@mattklein123 Let me know if there are any further changes that are needed. |
|
@fabolive I still see most of the classes have Zipkin in them in the Zipkin namespace. Are we skipping that for now? (I thought your comment meant you had changed all of them besides for ZipkinSpan). |
|
@mattklein123 I actually meant something else. I might have misunderstood your original comment. Never mind; let me rename the classes. |
|
@mattklein123 Classes renamed (except for ZipkinSpan). |
mattklein123
left a comment
There was a problem hiding this comment.
Few tiny things and then ready to ship.
| namespace Zipkin { | ||
|
|
||
| bool SpanBuffer::addSpan(SpanPtr&& span) { | ||
| bool SpanBuffer::addSpan(Span&& span) { |
There was a problem hiding this comment.
To make it clear that we are currently copying here, please make this method take a const Span& instead. I would also add a TODO that this needs to be looked at for perf.
| * span will be sent to out to Zipkin, or buffered so that it can be sent out later. | ||
| */ | ||
| class Reporter { | ||
| class ReporterInterface { |
There was a problem hiding this comment.
To match the rest of the code, please leave this as Reporter, and make the implementation class ReporterImpl
| #include "common/tracing/http_tracer_impl.h" | ||
| #include "common/tracing/zipkin/zipkin_core_constants.h" | ||
|
|
||
| #include "spdlog/spdlog.h" |
There was a problem hiding this comment.
I think @RomanDzhabarov already mentioned this, but this include should not be needed here.
There was a problem hiding this comment.
If I remove that include statement the code does not compile because of this:
throw EnvoyException(fmt::format("{} collector cluster is not defined on cluster manager level",
config.getString("collector_cluster")));
As you can see, I am actually using fmt::format(). Do you prefer that I do not use fmt::format in favor of plain string concatenation?
There was a problem hiding this comment.
Ah OK, that's fine. (Didn't seeing logging so was confused). No problem.
|
@mattklein123 Code updated. |
|
@kyessenov time to update Envoy SHA in our code |
This PR makes Envoy use the code added by PR #692 (Zipkin library).
This PR replaces the old PR #700.
cc: @mattklein123, @htuch, @RomanDzhabarov