Skip to content

Enable Zipkin tracing#905

Merged
mattklein123 merged 47 commits into
envoyproxy:masterfrom
amalgam8:use-zipkin-library
May 13, 2017
Merged

Enable Zipkin tracing#905
mattklein123 merged 47 commits into
envoyproxy:masterfrom
amalgam8:use-zipkin-library

Conversation

@fabolive
Copy link
Copy Markdown
Contributor

@fabolive fabolive commented May 5, 2017

This PR makes Envoy use the code added by PR #692 (Zipkin library).

This PR replaces the old PR #700.

cc: @mattklein123, @htuch, @RomanDzhabarov

@fabolive
Copy link
Copy Markdown
Contributor Author

fabolive commented May 5, 2017

@moderation Fixed!

* annotation then this Envoy instance initiated the span. Only the Envoy that initiates a Zipkin
* span should add binary annotations to it; otherwise, duplicate binary annotations
* would be created.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing *

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread source/server/configuration_impl.cc Outdated
http_tracer_.reset(
new Tracing::HttpTracerImpl(std::move(lightstep_driver), server_.localInfo()));
} else if (type == "zipkin") {
if (server_.localInfo().clusterName().empty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove else statement and add ASSERT checking for type is zipkin. ex https://github.com/lyft/envoy/blob/master/source/common/router/router_ratelimit.cc#L102

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"properties" : {
"type" : {
"type" : "string",
"enum" : ["zipkin"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed some prior discussion, but my reading of this enum is that tracing drivers were originally envisioned to all be specified under a single "driver" schema, with a type ("zipkin", "lightstep") specified via this field.

Most of the other fields are identical across "lightstep_driver" and "zipkin_driver."

Copy link
Copy Markdown
Contributor Author

@fabolive fabolive May 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original code had both "lightstep" and "zipkin" as valid values of the same enum, and I had made access_token_file an optional parameter, as it has no use for Zipkin. However, under PR #700 (the old PR replaced by this one), @RomanDzhabarov suggested the following: "this should be done in a way that there are separate schemas for different driver types, otherwise schema will not have much value."

@fabolive
Copy link
Copy Markdown
Contributor Author

fabolive commented May 7, 2017

@goaway I addressed two of your comments above. Please, see my reply to your comment on driver schema.

@fabolive
Copy link
Copy Markdown
Contributor Author

fabolive commented May 12, 2017

@mattklein123 Please, see my new commits addressing your comments.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, a few small things.


void Annotation::changeEndpointServiceName(const std::string& service_name) {
if (endpoint_.valid()) {
Endpoint endpoint(endpoint_.value());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid copying here by just adding a non-const value() function here: https://github.com/lyft/envoy/blob/master/include/envoy/common/optional.h#L37

The method signature should be: T& value() and the body will be the same as the other one.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry just to clarify, when you add the new non-const version, have the const version call the non-const version to avoid code duplication. :)

Copy link
Copy Markdown
Contributor Author

@fabolive fabolive May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot have one call the other without using const_cast. Are you okay with this use of const_cast?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The const version can call the non-cost version. Basically never use const_cast.

Copy link
Copy Markdown
Contributor Author

@fabolive fabolive May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be intuitive, indeed. That's what I did.
The problem here is different, though. Irrespective of whether or not const_cast is used, the compiler seems to detect an infinite recursion. Perhaps because of optimization? (I am compiling the code with the CI docker image, the one I pulled today).
So, the compiler is not happy with this:

T& value() {
    if (!valid_) {
      throw EnvoyException("fetching invalid Optional value");
    }
    return value_;
}
const T& value() const { return value(); }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, sorry, yes. I'm not sure off the top of my head the right way to deal with this. I would just copy the code and then make sure you have test coverage for both variants in optional_test (just declare a const Optional for testing).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I will duplicate the code. I believe avoiding code duplication requires an ugly combination of const_cast and static_cast...

return false;
}

ZipkinDriver::TlsZipkinTracer::TlsZipkinTracer(TracerPtr tracer, ZipkinDriver& driver)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for clarify making this take TracerPtr&& which will make it clear at the call site that you are moving.


namespace Zipkin {

ZipkinSpan::ZipkinSpan(Zipkin::Span& span) : span_(span) {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be optimal if you remove Zipkin from all of the class names now, since you are inside the Zipkin namespace. You can veto this but I would appreciate it. 😉

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only left Zipkin:: in the ZipkinSpan class. It turns out that, because ZipkinSpan inherits from Tracing::Span, any reference to Span is considered to be a reference to Tracing::Span and not to Zipkin::Span. I explicitly refer to Zipkin::Span to disambiguate. I was surprised by this behavior. Any suggestion?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No that's fine.

Tracer tracer("my_service_name", addr);
MockMonotonicTimeSource mock_start_time;
MonotonicTime start_time = mock_start_time.currentTime();
Runtime::RandomGeneratorImpl random_generator;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use MockRandomGenerator

Copy link
Copy Markdown
Contributor Author

@fabolive fabolive May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly do that; no problem. The reason I used the real RandomGenerator in some cases was that MockRandomGenerator::random() always returns 0; therefore it renders some of my checks useless. It would be nice if we could control the value to be returned by MockRandomGenerator::random() to some extent, e.g., zero or non-zero.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can. It's a mock, have it return whatever you want in your tests.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please remove commented out code. The point of the mock is so that you can return whatever value want, and can exercise all of your code paths completely and reproducibly. So make sure you have full coverage when using all mocks, and remove all commented out checks.


Span span2(std::move(span));

EXPECT_EQ(id, span2.id());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are going to bother writing a move constructor test, you should sanity check that span1 has actually been moved from, and a copy was not done.

Copy link
Copy Markdown
Contributor Author

@fabolive fabolive May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the test coverage reports with and without this test, the code above is indeed exercising the move constructor.
Given that a "move" in the pure sense of the word does not happen when std::move is invoked, I don't think there is a safe/good way to determine whether or not an object was moved from. One (ugly) approach to automatically verify if the move constructor was actually called would be to change the class implementation to set a member flag to a value in the case of a copy constructor execution, and to another value when the move constructor is exercised. I don't think we want that. Any ideas?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check span1 to see if it still contains data members. The move should literally move them, so string should be empty, etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is the case. My point is: there is no move in the literal sense. I believe what actually happens is some sort of "optimized copy". In fact, the data members of the original span seem to be intact.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you aren't actually moving anything. In the interest of moving this forward, I would kill the move constructor and just do the copy, and deal with the perf issues in a follow up.


void setupValidDriver() {
EXPECT_CALL(cm_, get("fake_cluster")).WillRepeatedly(Return(&cm_.thread_local_cluster_));
ON_CALL(*cm_.thread_local_cluster_.cluster_.info_, features())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this is the default so this line can be removed.

NiceMock<Runtime::MockLoader> runtime_;
NiceMock<ThreadLocal::MockInstance> tls_;
NiceMock<LocalInfo::MockLocalInfo> local_info_;
Runtime::RandomGeneratorImpl random_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mock

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just search all the test code for non-mock versions to make sure you get them all). Thanks.

@mattklein123
Copy link
Copy Markdown
Member

P.S., @rshriram @fabolive I would strongly recommend setting up a perf run with zipkin enabled and getting some (perf) traces. I think there is going to be some super low hanging fruit. Can obviously be done in a follow up.

@fabolive
Copy link
Copy Markdown
Contributor Author

@mattklein123 Please, see my latest commit and my replies to your comments. Do you have any suggestions based on all my comments above?

@mattklein123
Copy link
Copy Markdown
Member

@fabolive yes, I responded.

void Annotation::changeEndpointServiceName(const std::string& service_name) {
if (endpoint_.valid()) {
Endpoint endpoint(endpoint_.value());
Endpoint& endpoint = endpoint_.value();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delete this line and just do endpoint_.value().setServiceName(service_name)

@fabolive
Copy link
Copy Markdown
Contributor Author

@mattklein123 Let me know if there are any further changes that are needed.

@mattklein123
Copy link
Copy Markdown
Member

@fabolive I still see most of the classes have Zipkin in them in the Zipkin namespace. Are we skipping that for now? (I thought your comment meant you had changed all of them besides for ZipkinSpan).

@fabolive
Copy link
Copy Markdown
Contributor Author

@mattklein123 I actually meant something else. I might have misunderstood your original comment. Never mind; let me rename the classes.

@fabolive
Copy link
Copy Markdown
Contributor Author

@mattklein123 Classes renamed (except for ZipkinSpan).

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few tiny things and then ready to ship.

namespace Zipkin {

bool SpanBuffer::addSpan(SpanPtr&& span) {
bool SpanBuffer::addSpan(Span&& span) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it clear that we are currently copying here, please make this method take a const Span& instead. I would also add a TODO that this needs to be looked at for perf.

Comment thread source/common/tracing/zipkin/tracer.h Outdated
* span will be sent to out to Zipkin, or buffered so that it can be sent out later.
*/
class Reporter {
class ReporterInterface {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the rest of the code, please leave this as Reporter, and make the implementation class ReporterImpl

#include "common/tracing/http_tracer_impl.h"
#include "common/tracing/zipkin/zipkin_core_constants.h"

#include "spdlog/spdlog.h"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @RomanDzhabarov already mentioned this, but this include should not be needed here.

Copy link
Copy Markdown
Contributor Author

@fabolive fabolive May 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove that include statement the code does not compile because of this:

throw EnvoyException(fmt::format("{} collector cluster is not defined on cluster manager level",
                                     config.getString("collector_cluster")));

As you can see, I am actually using fmt::format(). Do you prefer that I do not use fmt::format in favor of plain string concatenation?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, that's fine. (Didn't seeing logging so was confused). No problem.

@fabolive
Copy link
Copy Markdown
Contributor Author

@mattklein123 Code updated.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@mattklein123 mattklein123 merged commit 14e1f10 into envoyproxy:master May 13, 2017
@rshriram
Copy link
Copy Markdown
Member

@kyessenov time to update Envoy SHA in our code

@fabolive fabolive deleted the use-zipkin-library branch May 22, 2017 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants