tracing: add API for OpenCensus agent exporter#1563
tracing: add API for OpenCensus agent exporter#1563istio-testing merged 3 commits intoistio:masterfrom
Conversation
|
😊 Welcome @brianwolfe! This is either your first contribution to the Istio api repo, or it's been You can learn more about the Istio working groups, code of conduct, and contributing guidelines Thanks for contributing! Courtesy of your friendly welcome wagon. |
|
Hi @brianwolfe. Thanks for your PR. I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
howardjohn
left a comment
There was a problem hiding this comment.
Seems reasonable to me - my only concern is what, if any, impact the change in Envoy to move from static to dynamic tracing config will have. I suspect we will still maintain the same API and just expose it differently. @douglas-reid @mandarjog
|
Static to dynamic is similar. OpenCensus & Stackdriver using OpenCensus tracers have to have a fixed configuration for the entire proxy process because OpenCensus uses static variables for state and so can only handle a single version of the configuration, see: envoyproxy/envoy#10519 for some context. |
|
|
||
| // OpenCensus defines configuration for an OpenCensus tracer. | ||
| // See [OpenCensus trace config](https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/trace/v1/trace_config.proto) for details. | ||
| message OpenCensus { |
There was a problem hiding this comment.
Should we expand this to make it possible to support Stackdriver via this configuration (and deprecate the Stackdriver config)?
There was a problem hiding this comment.
Yes, make sense to me to replace Stackdriver configuration with this. We can add a backend enum in this Opencensus message.
There was a problem hiding this comment.
We should support all oc tracer backends including including zipkin thru this.
Please add enum.
There was a problem hiding this comment.
I didn't think we would want to expose the tracing implementation to the user. The OpenCensus & Stackdriver are really the external interface that a user would deal with, they wouldn't need to necessarily know that it is all OpenCensus under the hood.
I can make a try at merging them if you disagree.
There was a problem hiding this comment.
The only names I could find for this format were:
- OpenCensus Agent protocol
- OpenCensus Service
tbh, neither of them seemed to add clarity.
One advantage of specifying the top-level OpenCensus format is that we can consolidate the logic for specifying which headers we read/write and allow a user to only specify those headers they actually want in their infrastructure.
To be more specific, I'd like to be able to only specify TRACE_CONTEXT to avoid extra bytes on the wire here: https://github.com/istio/istio/blob/master/tools/packaging/common/envoy_bootstrap.json#L627-L628 If we just make an OpenCensus top-level configuration with an enum for the backend, we can easily do that. The other trace implementations won't have the same header options.
Another advantage is that we can differentiate between Zipkin backend implementations and so migrate people to the OpenCensus implementation with some more flexible options for trace propagation headers.
There was a problem hiding this comment.
Should we have a short design doc on this change ?
There was a problem hiding this comment.
I can make one. How do I get write access to the drive though?
There was a problem hiding this comment.
I made a quick draft PTAL: https://docs.google.com/document/d/1rbVLU0G391eKincaAJQhpBYAdhcvaF2G1sqGkJgKLcQ/edit#heading=h.xw1gqgyqs5b
I'm still waiting on approval to actually have write access to the drive, I had a coworker copy it into the drive.
|
/ok-to-test |
|
@mandarjog @bianpengyuan, thanks for the feedback, I updated to match with https://docs.google.com/document/d/1rbVLU0G391eKincaAJQhpBYAdhcvaF2G1sqGkJgKLcQ/edit# PTAL |
|
|
||
| // OpenCensusAgent defines configuration for an OpenCensus tracer writing to | ||
| // an OpenCensus agent backend. See | ||
| // [OpenCensus trace config](https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/trace/v1/trace_config.proto) |
There was a problem hiding this comment.
We probably should link to envoy's doc: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/trace/v3/opencensus.proto, which is more aligned with the proto defined here. Or link to both. we should update the comment of Stackdriver tracer to link to that as well.
There was a problem hiding this comment.
ack. Will update after commit is reviewed internally.
| // $hide_from_docs | ||
| // Unspecified context. Should not be used for now, but added to reserve | ||
| // the 0 enum value if TraceContext is used outside of a repeated field. | ||
| UNSPECIFIED = 0; |
There was a problem hiding this comment.
Why not make it NONE which follows envoy trace context: https://github.com/envoyproxy/envoy/blob/598e76a51b781bdd435d5d4f96d482fc283695b7/api/envoy/config/trace/v3/opencensus.proto#L30. I don't think it is necessary to hide this from docs, since it is not just a reserved field, but also a valid configuration for Envoy.
There was a problem hiding this comment.
I'm ok permitting NONE if you want, but that seems like a detail in the APi that would be confusing without any benefit for users. If they don't want distributed tracing, why would they configure a tracer?
There was a problem hiding this comment.
I chose UNSPECIFIED since that is the closest I could find in the Google style guide for protobufs. https://developers.google.com/protocol-buffers/docs/style#enums and it seemed to be consistent within Istio. I assumed Istio would roughly follow the google style guide for protobuf instead of mix-and-match with envoy & google styles.
|
|
||
| // Specifies the set of context propagation headers used for distributed | ||
| // tracing. Default is `["W3C_TRACE_CONTEXT"]`. If multiple values are specified, | ||
| // the proxy will attempt to read each header for each request and will |
There was a problem hiding this comment.
Do you want to income and outgoing context separated?
There was a problem hiding this comment.
I don't think it particularly makes sense to do so. I haven't seen a use-case where people really want to specify these separately for Istio. For Istio it adds to the confusion for users since it is incoming & outgoing w.r.t. envoy (really contexts-to-read and contexts-to-write).
I think this should not be separated unless we have a compelling use-case.
There was a problem hiding this comment.
maybe after we have per-listener configs it will make more sense to separate them out.
| message OpenCensusAgent { | ||
| // TraceContext selects the context propagation headers used for | ||
| // distributed tracing. | ||
| enum TraceContext { |
There was a problem hiding this comment.
@douglas-reid should we hoist Tracecontext msg out of the OC proto?
We can use it with SD as well, and is general enough.
There was a problem hiding this comment.
@mandarjog my main concern with hoisting it out is that it would also make sense for Lightstep implementation, but the set of TraceContexts supported are different, so we will end up needing to prefix them.
I'm ok with either option, just seems cleaner to keep the enums inside for each trace exporter.
There was a problem hiding this comment.
@mandarjog that sort of overlap was behind my original suggestion to deprecate stackdriver (and others) in favor of OpenCensus.
I think we need to decide what the scope of this change is.
I think adding OpenCensusAgent is a tactical change to support a needed use case -- so maybe we can avoid expanding TraceContext for now.
As you know, I feel strongly we should push on a Telemetry API that re-organizes some of this configuration.
There was a problem hiding this comment.
The main purpose I have for putting this change up is to get our Istio configurations to be more standard (right now we have weird configuration hacks to use the OpenCensus & custom metrics in a bootstrap configuration that make it hard to explain). When we have this change in and we upgrade to 1.7, I will be in a better place to meaningfully contribute to @douglas-reid's proposal since our forward migration will look like everyone else's.
There was a problem hiding this comment.
If you want to delay any push on this until the next telemetry API is out, I understand. I think we can technically continue running with our hacked-up bootstrap configuration for the next few versions.
There was a problem hiding this comment.
@brianwolfe i think this PR makes sense as-is, especially since it will remove hacked-up bootstraps in the short term.
There was a problem hiding this comment.
Thanks, I'll wait for next round of feedback from bianpengyuan & mandarjog, I'd like to be on the same page w.r.t. copying envoy TRACE_CONTEXT semantics.
This is based heavily on the existing Stackdriver configuration. Stackdriver and OpenCensus exporter will both use the OpenCensus tracer implementation. This will permit OpenCensus to export OpenCensus agent-formatted spans. OpenCensus-formatted spans are handled by the OpenTelemetry collector, providing a migration path from OpenCensus to OpenTelemetry when OpenTelemetry is more complete and integrated with Envoy. Change-Id: I566299c9291021d26ec1e839643871a380807bc3 Reviewed-on: https://gerrit.musta.ch/c/public/istio-api/+/551 Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
Specify the set of trace headers to use for distributed trace context propagation with a new TraceContext enum in the OpenCensusAgent tracer. OpenCensus is the name of both the tracer implementation and the backend service. To disambiguate between the two terms, rename OpenCensus to OpenCensusAgent, since it is specific to the backend service. Both Stackdriver & OpenCensus will use the OpenCensus trace implementation under the hood. Change-Id: Id860fbf6bf6b1345c7276bb2c2b93de3f69a9d44 Reviewed-on: https://gerrit.musta.ch/c/public/istio-api/+/575 Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
The existing documentation for Stackdriver and OpenCensus reference the OpenCensus configuration. We should reference both the Envoy OpenCensus tracer configuration and the OpenCensus configuration. Change-Id: Ia3fcbc879e2e15f90a56959cf97fe30b57211782 Reviewed-on: https://gerrit.musta.ch/c/public/istio-api/+/582 Reviewed-by: Ying Zhu <ying.zhu@airbnb.com> Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
e82778f to
c80e298
Compare
|
@googlebot I signed it! |
|
@nrjpoddar can you approve the api change? |
|
@nrjpoddar ping |
|
@brianwolfe currently Istio documentation only talks about propagating X-B3-* , headers. Are you planning to add docs for how users can propagate other tracing headers if non Zipkin style headers are used? I’m not familiar with all the supported tracers in Envoy which looks like a rather long list. We would need high level docs explaining how OpenCensus tracer can be enabled similar to how we currently explain the setup related to Jaeger, Zipkin and Lightstep. |
|
@nrjpoddar Yeah, I can definitely add a longer section in here: https://github.com/istio/istio.io/blob/e9a89c879f6925593e2147b7778baeff85ac9bba/content/en/docs/tasks/observability/distributed-tracing/overview/index.md and https://github.com/istio/istio.io/blob/eff0d43a27d485e3997b38109768e1650cad70cf/content/en/faq/distributed-tracing/how-to-support-tracing.md to better document the options around context propagation for distributed tracing. I'll need help with the Chinese translation though. |
|
@nrjpoddar what order should I update in? API, then main codebase, then site docs? |
If this is going in in release-1.7 branch then we need to make sure we get all 3 changes merged soon i.e. before end of next week. If this is targeted for 1.8, then this is my preferred order:
|
|
I don't think it makes sense to target release-1.7. I'm ok delaying this & related work until 1.8. Rushing it seems likely to mean we'll end up with something more confusing. We can backport the work internally for our use-case or bring it into a point release for 1.7 if people really want the feature. |
|
|
||
| // debug enables trace output to stdout | ||
| // $hide_from_docs | ||
| bool debug = 3; |
There was a problem hiding this comment.
Why are these hidden from docs but not the rest?
There was a problem hiding this comment.
We mostly use these for checking maximum performance impact & ensuring the tracer gets instantiated correctly. They are fairly detailed configuration options that I'd expect mostly istio developers who care about performance to examine.
This appears to be the same reasoning as the existing parameters in Stackdriver that are also hidden from the docs.
wdyt?
There was a problem hiding this comment.
We shouldn't even have an API for these options and see if ENV vars or other mechanism can be used for Istio devs. Having a hidden option in user facing API which is never going to be used to end-users seems odd to me.
There was a problem hiding this comment.
I see. That is fine for me. I can remove them from this tracer implementation. Do you see any reason to keep them for Stackdriver?
There was a problem hiding this comment.
Didn't realize this would automerge when I ran the test, sorry about that.
There was a problem hiding this comment.
I'm not a stack driver expert so I would leave it up to @mandarjog or @douglas-reid to answer that question.
|
LGTM as it's for 1.8 and we can refine it later. Please answer my question about hidden fields from docs before merging. |
|
/test build_api |
These configurations were added to be consistent with Stackdriver and provide extra performance tuning for OpenCensus. These are unlikely to be used in production. Tuning should probably be performed by Istio developers instead. We can remove the options for now. This was triggered by a conversation at the end of the previous CL merge: istio#1563 Change-Id: Iab148ff87dfe5b1772d3cdf2a009ec9cf4ea0f27 Reviewed-on: https://gerrit.musta.ch/c/public/istio-api/+/596 Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com> Reviewed-by: Ying Zhu <ying.zhu@airbnb.com>
These configurations were added to be consistent with Stackdriver and provide extra performance tuning for OpenCensus. These are unlikely to be used in production. Tuning should probably be performed by Istio developers instead. We can remove the options for now. This was triggered by a conversation at the end of the previous CL merge: #1563 Change-Id: Iab148ff87dfe5b1772d3cdf2a009ec9cf4ea0f27 Reviewed-on: https://gerrit.musta.ch/c/public/istio-api/+/596 Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com> Reviewed-by: Ying Zhu <ying.zhu@airbnb.com>
Add support for OpenCensus tracing. This is similar to the existing Stackdriver support. This is tested by adding a new tracing bootstrap golden test comparison and by individually testing the configuration options, similar to the existing tracers. This implements the API from this PR: istio/api#1563 Change-Id: I65d87c6c5e3e52804b06f6b3d0a35f9515e43c7e Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/559 Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
Add support for OpenCensus tracing. This is similar to the existing Stackdriver support. This is tested by adding a new tracing bootstrap golden test comparison and by individually testing the configuration options, similar to the existing tracers. This implements the API from this PR: istio/api#1563 Change-Id: I65d87c6c5e3e52804b06f6b3d0a35f9515e43c7e Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/559 Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
* tracing: add support for OpenCensus Add support for OpenCensus tracing. This is similar to the existing Stackdriver support. This is tested by adding a new tracing bootstrap golden test comparison and by individually testing the configuration options, similar to the existing tracers. This implements the API from this PR: istio/api#1563 Change-Id: I65d87c6c5e3e52804b06f6b3d0a35f9515e43c7e Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/559 Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com> * release-notes: add note for opencensusagent * tests/integration: add opencensusagent trace test This test adds integration support for opencensusagent, sharing most of the work from the existing zipkin integration tests, this exports traces to an opentelemetry-collector instance, which forwards the traces to zipkin. This same setup should be usable when envoy supports opentelemetry tracing natively.
This is based heavily on the existing Stackdriver configuration.
Stackdriver and OpenCensusAgent exporter will both use the OpenCensus tracer
implementation. This will permit OpenCensus to export OpenCensus agent
formatted spans.
OpenCensus agent-formatted spans are handled by the OpenTelemetry collector,
providing a migration path from OpenCensus to OpenTelemetry when
OpenTelemetry is more complete and integrated with Envoy.
@howardjohn
Change-Id: I566299c9291021d26ec1e839643871a380807bc3