Skip to content

Add Trace Service Support.#345

Merged
mattklein123 merged 1 commit intoenvoyproxy:masterfrom
bogdandrutu:opencensus
Jan 19, 2018
Merged

Add Trace Service Support.#345
mattklein123 merged 1 commit intoenvoyproxy:masterfrom
bogdandrutu:opencensus

Conversation

@bogdandrutu
Copy link
Copy Markdown

This is a trace service proposal based on the opencensus (opencensus.io) data model. OpenCensus is an vendor independent tracing (support for zipkin, stackdriver, and soon others) and stats library(prometheus, stackdirver, signalfx, and soon others), integrated in products like gRPC (grpc.io).

One of the usages will be in the Istio (istio.io) environment to push traces to the Mixer.

/cc @douglas-reid @louiscryan

@bogdandrutu bogdandrutu force-pushed the opencensus branch 4 times, most recently from 1d08b2a to e5deb87 Compare December 14, 2017 03:05
Comment thread api/trace_service.proto Outdated

// Configuration structure.
message TraceServiceConfig {
// The name of the upstream cluster that hosts the metrics service. The cluster must be
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/metrics/trace ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mattklein123
Copy link
Copy Markdown
Member

@bogdandrutu can you potentially give us some overview of what the plans are in this area? Is the plan to implement an Envoy trace driver that speaks this API (easy enough)? I'm mainly trying to understand how this interacts with the OT C++ API (envoyproxy/envoy#2017). In general, I'm a huge fan of the API driven approach so would love to see this move forward.

cc @rnburn @tedsuo

@douglas-reid
Copy link
Copy Markdown
Contributor

@mattklein123 i believe @bogdandrutu just left for vacation and may be a bit slow to respond. I will try to offer my understanding in the meantime.

The plan is to follow this PR with code that enables Envoy to speak this API. The OpenCensus c++ library was just recently released (https://github.com/census-instrumentation/opencensus-cpp). I imagine the Envoy work will involve that library. I'm less clear on the relationship between OT and OpenCensus, and on the details of how the OpenCensus support is anticipated to be added.

@bogdandrutu
Copy link
Copy Markdown
Author

@douglas-reid indeed I am on vacation and will respond a bit slower.
@mattklein123 yes we are planning to implement a trace driver that speaks this API, and we believe will not be that hard. OT does not define a data-model (at least last time when I checked) so we cannot use that. OpenCensus tries to offer multiple backend support, currently it offers support for Stackdriver, Zipkin, and we are working with other providers to support them (Dynatrace, Datadog, etc.).

PS: The API that @douglas-reid pointed is in process to become stable, the implementation needs re-work especially for performance point of view but we are committed to do that asap.

@mattklein123
Copy link
Copy Markdown
Member

OK in general this sounds great to me. Please get the PR passing tests. We can discuss implementation details in January.

Comment thread api/trace_service.proto
message StreamTracesResponse {
}

message StreamTracesMessage {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/StreamTracesMessage/StreamingTracesRequest?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Comment thread api/trace_service.proto
}
}

message StreamTracesResponse {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamingTracesResponse would be a proper noun phrase.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123
Copy link
Copy Markdown
Member

FYI envoyproxy/envoy#2017 was just merged. IMO it would be nice if the driver was actually an OT C++ driver so it can work in other places also where C++ code is used. We can figure it out in January.

@mattklein123
Copy link
Copy Markdown
Member

@bogdandrutu friendly ping. Can we push this forward? Otherwise I would like to close for now and we can reopen when you are ready?

@rakyll
Copy link
Copy Markdown

rakyll commented Jan 5, 2018

Bogdan is OOO until mid-January.

@mattklein123
Copy link
Copy Markdown
Member

@rakyll OK can leave it open until @bogdandrutu gets back.

@bogdandrutu
Copy link
Copy Markdown
Author

@mattklein123 working on this right now.

@bogdandrutu
Copy link
Copy Markdown
Author

@mattklein123 everything is ready.

Comment thread api/trace_service.proto Outdated
message TraceServiceConfig {
// The name of the upstream cluster that hosts the trace service. The cluster
// must be configured in the cluster manager.
string cluster_name = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should switch to the new GrpcService message. (See other examples).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bogdandrutu
Copy link
Copy Markdown
Author

@mattklein123 @htuch @douglas-reid probably one of you has better experience with bazel/proto. I get this error sometimes on the ci:test and cannot reproduce locally. Do you have any idea why?

@htuch
Copy link
Copy Markdown
Member

htuch commented Jan 17, 2018

@bogdandrutu this looks like a compiler issue rather than a Bazel one; it's not seeing the issue though.

@bogdandrutu
Copy link
Copy Markdown
Author

@htuch the failure is random (sometimes pass, sometimes not), that's why I think I am doing something wrong related to bazel/proto.

Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
@bogdandrutu
Copy link
Copy Markdown
Author

Everything looks good.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @htuch @wora any comments?

@kyessenov let's merge this and then you can move it around in your PR?

@kyessenov
Copy link
Copy Markdown
Contributor

Sure, let's do that.

@mattklein123 mattklein123 merged commit aae0051 into envoyproxy:master Jan 19, 2018
Comment thread api/trace_service.proto
import "api/grpc_service.proto";
import "trace.proto";

import "google/api/annotations.proto";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not needed.

Comment thread api/trace_service.proto
// not expect any response to be sent as nothing would be done in the case
// of failure.
rpc StreamTraces(stream StreamTracesMessage) returns (StreamTracesResponse) {
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ; instead of {}.

Comment thread api/trace_service.proto

import "validate/validate.proto";

// Service for streaming traces to server that consumes the trace data. It
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence can be simplified: An interface for streaming traces from a client to a server.

There is no need to mention "consumes the trace data". The server can just forward the data elsewhere. There is no need to speculate the usage.

Comment thread api/trace_service.proto
import "validate/validate.proto";

// Service for streaming traces to server that consumes the trace data. It
// uses OpenCensus data model as a standard to represent trace information.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "as a standard", since it doesn't provide much value.

Please add a link to OpenCensus

Comment thread api/trace_service.proto
// Service for streaming traces to server that consumes the trace data. It
// uses OpenCensus data model as a standard to represent trace information.
service TraceService {
// Envoy will connect and send StreamTracesMessage messages forever. It does
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface can be used by different client, there is no need to mention Envoy. "forever" is also too strong. How about:

// The client will stream trace to the server continuously until either the client or the server terminates the stream.

Comment thread api/trace_service.proto
message StreamTracesMessage {
message Identifier {
// The node sending the access log messages over the stream.
Node node = 1 [(validate.rules).message.required = true];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put the trace data into a data, the key should always be a string, the client should normalize the id into a string instead of asking server to do it.

Comment thread api/trace_service.proto

// Configuration structure.
message TraceServiceConfig {
// The upstream gRPC cluster that hosts the metrics service.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why trace service has anything to do with metrics service? Do they share the schema?

@htuch
Copy link
Copy Markdown
Member

htuch commented Jul 16, 2018

Did we ever do anything with this? I see its presence in the repository but not any users.

@moderation
Copy link
Copy Markdown
Contributor

I asked a similar question on Gitter recently - https://gitter.im/census-instrumentation/Lobby?at=5b43d06b9b82c6701bafe6b2. In the responses it looks like the work is being done at the Istio Mixer level (which makes sense) and not at the Envoy level. It looks like https://www.envoyproxy.io/docs/envoy/latest/api-v2/config/trace/v2/trace.proto#config-trace-v2-traceserviceconfig will emit OpenCensus stats to a gRPC endpoint but I'm not really sure.

@moderation
Copy link
Copy Markdown
Contributor

moderation commented Sep 12, 2018

I looked at this again today after the Twitter debate re: OpenCensus & OpenTracing. I'm pretty sure that Trace Service isn't hooked up and can't be used. It is in the documentation at config/trace/v2/trace.proto#config-trace-v2-traceserviceconfig and referred to in proto.

However it isn't defined as a well known name for either tracers source/extensions/tracers/well_known_names.h or stats_sinks stat_sinks/well_known_names.h

After all this I don't know if the metrics service and tracing service are the same thing or different partially implemented things with different names. @ramaraochavali did most the of the metrics service work but I can't find an example of a working config or how this works. It doesn't look like the current metrics service leverages any part of OpenCensus.

I was thinking that if the tracing service worked you could point it to the OpenCensus service and then be able to export traces to multiple backends.

Would be great if @bogdandrutu or @rakyll could chime in. Also good to hear from Istio Policies & Telemetry leads @douglas-reid and @mandarjog

@ramaraochavali
Copy link
Copy Markdown
Contributor

ramaraochavali commented Sep 12, 2018

Metrics Service and Tracing Service IMO are different. Metric Service implements gRPC stats sink so that envoy stats can be pushed to the configured end point. It does not deal with OpenCensus at all.
This is how you can configure metrics service

stats_sinks:
  - name: envoy.metrics_service
    config: {grpc_service: {envoy_grpc: {cluster_name: xds_cluster}}}

where xds_cluster points to a static cluster exposing gRPC end point that this services pushes metrics to.

@douglas-reid
Copy link
Copy Markdown
Contributor

@ramaraochavali is correct. The metrics service and the tracing service are different. The original intent of this PR was solely to provide an API for Envoy to stream out trace spans to a backend. OpenCensus obviously supports stats/metrics and traces, but this was focused exclusively on trace data.

I cannot speak with any authority on the OpenCensus' teams priorities around implementation. At the time of the original PR, I understood that work based on their recently released code would immediately follow, hooking up this API. Sometime soon after, priorities shifted I guess. As @moderation notes, there have been Istio Mixer PRs that have added support for generating trace spans from Report calls. This should enable export of traces to any trace backend, based on request/response metadata, for Istio use cases.

I don't know if the OpenCensus team wants to continue pursuing trace export via Envoy or not. I think it would be useful. The last I heard, the OpenCensus folks were working on adding support for stats export from Envoy however. It would be good to have confirmation of that effort.

@g-easy
Copy link
Copy Markdown

g-easy commented Sep 12, 2018

OpenCensus contributor here. We're looking into supporting this and will come back with a more concrete proposal soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants