Skip to content

tracing: add API for OpenCensus agent exporter#1563

Merged
istio-testing merged 3 commits intoistio:masterfrom
airbnb:air-add-opencensus-tracing
Aug 8, 2020
Merged

tracing: add API for OpenCensus agent exporter#1563
istio-testing merged 3 commits intoistio:masterfrom
airbnb:air-add-opencensus-tracing

Conversation

@brianwolfe
Copy link
Copy Markdown
Member

@brianwolfe brianwolfe commented Jul 29, 2020

This is based heavily on the existing Stackdriver configuration.
Stackdriver and OpenCensusAgent exporter will both use the OpenCensus tracer
implementation. This will permit OpenCensus to export OpenCensus agent
formatted spans.

OpenCensus agent-formatted spans are handled by the OpenTelemetry collector,
providing a migration path from OpenCensus to OpenTelemetry when
OpenTelemetry is more complete and integrated with Envoy.

@howardjohn

Change-Id: I566299c9291021d26ec1e839643871a380807bc3

@istio-policy-bot
Copy link
Copy Markdown

😊 Welcome @brianwolfe! This is either your first contribution to the Istio api repo, or it's been
awhile since you've been here.

You can learn more about the Istio working groups, code of conduct, and contributing guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Jul 29, 2020
@istio-testing istio-testing added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test labels Jul 29, 2020
@istio-testing
Copy link
Copy Markdown
Collaborator

Hi @brianwolfe. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me - my only concern is what, if any, impact the change in Envoy to move from static to dynamic tracing config will have. I suspect we will still maintain the same API and just expose it differently. @douglas-reid @mandarjog

@brianwolfe
Copy link
Copy Markdown
Member Author

Static to dynamic is similar. OpenCensus & Stackdriver using OpenCensus tracers have to have a fixed configuration for the entire proxy process because OpenCensus uses static variables for state and so can only handle a single version of the configuration, see: envoyproxy/envoy#10519 for some context.

Comment thread mesh/v1alpha1/proxy.proto Outdated

// OpenCensus defines configuration for an OpenCensus tracer.
// See [OpenCensus trace config](https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/trace/v1/trace_config.proto) for details.
message OpenCensus {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expand this to make it possible to support Stackdriver via this configuration (and deprecate the Stackdriver config)?

@mandarjog @bianpengyuan

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, make sense to me to replace Stackdriver configuration with this. We can add a backend enum in this Opencensus message.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should support all oc tracer backends including including zipkin thru this.
Please add enum.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think we would want to expose the tracing implementation to the user. The OpenCensus & Stackdriver are really the external interface that a user would deal with, they wouldn't need to necessarily know that it is all OpenCensus under the hood.

I can make a try at merging them if you disagree.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg. I'll update.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only names I could find for this format were:

  • OpenCensus Agent protocol
  • OpenCensus Service

tbh, neither of them seemed to add clarity.

One advantage of specifying the top-level OpenCensus format is that we can consolidate the logic for specifying which headers we read/write and allow a user to only specify those headers they actually want in their infrastructure.

To be more specific, I'd like to be able to only specify TRACE_CONTEXT to avoid extra bytes on the wire here: https://github.com/istio/istio/blob/master/tools/packaging/common/envoy_bootstrap.json#L627-L628 If we just make an OpenCensus top-level configuration with an enum for the backend, we can easily do that. The other trace implementations won't have the same header options.

Another advantage is that we can differentiate between Zipkin backend implementations and so migrate people to the OpenCensus implementation with some more flexible options for trace propagation headers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a short design doc on this change ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make one. How do I get write access to the drive though?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a quick draft PTAL: https://docs.google.com/document/d/1rbVLU0G391eKincaAJQhpBYAdhcvaF2G1sqGkJgKLcQ/edit#heading=h.xw1gqgyqs5b

I'm still waiting on approval to actually have write access to the drive, I had a coworker copy it into the drive.

Copy link
Copy Markdown
Member Author

@brianwolfe brianwolfe Jul 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mandarjog
Copy link
Copy Markdown
Contributor

/ok-to-test

@istio-testing istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed needs-ok-to-test size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 29, 2020
@brianwolfe
Copy link
Copy Markdown
Member Author

@mandarjog @bianpengyuan, thanks for the feedback, I updated to match with https://docs.google.com/document/d/1rbVLU0G391eKincaAJQhpBYAdhcvaF2G1sqGkJgKLcQ/edit#

PTAL

Comment thread mesh/v1alpha1/proxy.proto

// OpenCensusAgent defines configuration for an OpenCensus tracer writing to
// an OpenCensus agent backend. See
// [OpenCensus trace config](https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/trace/v1/trace_config.proto)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should link to envoy's doc: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/trace/v3/opencensus.proto, which is more aligned with the proto defined here. Or link to both. we should update the comment of Stackdriver tracer to link to that as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. Will update after commit is reviewed internally.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment thread mesh/v1alpha1/proxy.proto
// $hide_from_docs
// Unspecified context. Should not be used for now, but added to reserve
// the 0 enum value if TraceContext is used outside of a repeated field.
UNSPECIFIED = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make it NONE which follows envoy trace context: https://github.com/envoyproxy/envoy/blob/598e76a51b781bdd435d5d4f96d482fc283695b7/api/envoy/config/trace/v3/opencensus.proto#L30. I don't think it is necessary to hide this from docs, since it is not just a reserved field, but also a valid configuration for Envoy.

Copy link
Copy Markdown
Member Author

@brianwolfe brianwolfe Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok permitting NONE if you want, but that seems like a detail in the APi that would be confusing without any benefit for users. If they don't want distributed tracing, why would they configure a tracer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose UNSPECIFIED since that is the closest I could find in the Google style guide for protobufs. https://developers.google.com/protocol-buffers/docs/style#enums and it seemed to be consistent within Istio. I assumed Istio would roughly follow the google style guide for protobuf instead of mix-and-match with envoy & google styles.

Comment thread mesh/v1alpha1/proxy.proto

// Specifies the set of context propagation headers used for distributed
// tracing. Default is `["W3C_TRACE_CONTEXT"]`. If multiple values are specified,
// the proxy will attempt to read each header for each request and will
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to income and outgoing context separated?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it particularly makes sense to do so. I haven't seen a use-case where people really want to specify these separately for Istio. For Istio it adds to the confusion for users since it is incoming & outgoing w.r.t. envoy (really contexts-to-read and contexts-to-write).

I think this should not be separated unless we have a compelling use-case.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe after we have per-listener configs it will make more sense to separate them out.

Comment thread mesh/v1alpha1/proxy.proto
message OpenCensusAgent {
// TraceContext selects the context propagation headers used for
// distributed tracing.
enum TraceContext {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@douglas-reid should we hoist Tracecontext msg out of the OC proto?
We can use it with SD as well, and is general enough.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mandarjog my main concern with hoisting it out is that it would also make sense for Lightstep implementation, but the set of TraceContexts supported are different, so we will end up needing to prefix them.

I'm ok with either option, just seems cleaner to keep the enums inside for each trace exporter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mandarjog that sort of overlap was behind my original suggestion to deprecate stackdriver (and others) in favor of OpenCensus.

I think we need to decide what the scope of this change is.

I think adding OpenCensusAgent is a tactical change to support a needed use case -- so maybe we can avoid expanding TraceContext for now.

As you know, I feel strongly we should push on a Telemetry API that re-organizes some of this configuration.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main purpose I have for putting this change up is to get our Istio configurations to be more standard (right now we have weird configuration hacks to use the OpenCensus & custom metrics in a bootstrap configuration that make it hard to explain). When we have this change in and we upgrade to 1.7, I will be in a better place to meaningfully contribute to @douglas-reid's proposal since our forward migration will look like everyone else's.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to delay any push on this until the next telemetry API is out, I understand. I think we can technically continue running with our hacked-up bootstrap configuration for the next few versions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianwolfe i think this PR makes sense as-is, especially since it will remove hacked-up bootstraps in the short term.

Copy link
Copy Markdown
Member Author

@brianwolfe brianwolfe Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll wait for next round of feedback from bianpengyuan & mandarjog, I'd like to be on the same page w.r.t. copying envoy TRACE_CONTEXT semantics.

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Aug 5, 2020
This is based heavily on the existing Stackdriver configuration.
Stackdriver and OpenCensus exporter will both use the OpenCensus tracer
implementation. This will permit OpenCensus to export OpenCensus
agent-formatted spans.

OpenCensus-formatted spans are handled by the OpenTelemetry collector,
providing a migration path from OpenCensus to OpenTelemetry when
OpenTelemetry is more complete and integrated with Envoy.

Change-Id: I566299c9291021d26ec1e839643871a380807bc3
Reviewed-on: https://gerrit.musta.ch/c/public/istio-api/+/551
Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
Specify the set of trace headers to use for distributed trace context
propagation with a new TraceContext enum in the OpenCensusAgent tracer.

OpenCensus is the name of both the tracer implementation and the backend
service. To disambiguate between the two terms, rename OpenCensus to
OpenCensusAgent, since it is specific to the backend service. Both
Stackdriver & OpenCensus will use the OpenCensus trace implementation
under the hood.

Change-Id: Id860fbf6bf6b1345c7276bb2c2b93de3f69a9d44
Reviewed-on: https://gerrit.musta.ch/c/public/istio-api/+/575
Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
The existing documentation for Stackdriver and OpenCensus reference the
OpenCensus configuration. We should reference both the Envoy OpenCensus
tracer configuration and the OpenCensus configuration.

Change-Id: Ia3fcbc879e2e15f90a56959cf97fe30b57211782
Reviewed-on: https://gerrit.musta.ch/c/public/istio-api/+/582
Reviewed-by: Ying Zhu <ying.zhu@airbnb.com>
Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
@airbnb-gerrit airbnb-gerrit force-pushed the air-add-opencensus-tracing branch from e82778f to c80e298 Compare August 5, 2020 17:58
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Aug 5, 2020
@brianwolfe
Copy link
Copy Markdown
Member Author

@googlebot I signed it!

Copy link
Copy Markdown
Contributor

@mandarjog mandarjog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Brian

@mandarjog
Copy link
Copy Markdown
Contributor

@nrjpoddar can you approve the api change?

@brianwolfe
Copy link
Copy Markdown
Member Author

@nrjpoddar ping

@nrjpoddar
Copy link
Copy Markdown
Member

nrjpoddar commented Aug 7, 2020

@brianwolfe currently Istio documentation only talks about propagating X-B3-* , headers. Are you planning to add docs for how users can propagate other tracing headers if non Zipkin style headers are used? I’m not familiar with all the supported tracers in Envoy which looks like a rather long list.

We would need high level docs explaining how OpenCensus tracer can be enabled similar to how we currently explain the setup related to Jaeger, Zipkin and Lightstep.

@brianwolfe
Copy link
Copy Markdown
Member Author

@brianwolfe
Copy link
Copy Markdown
Member Author

@nrjpoddar what order should I update in? API, then main codebase, then site docs?

@nrjpoddar
Copy link
Copy Markdown
Member

@nrjpoddar what order should I update in? API, then main codebase, then site docs?

If this is going in in release-1.7 branch then we need to make sure we get all 3 changes merged soon i.e. before end of next week.

If this is targeted for 1.8, then this is my preferred order:

  1. API
  2. Implementation
  3. Docs

@brianwolfe
Copy link
Copy Markdown
Member Author

I don't think it makes sense to target release-1.7. I'm ok delaying this & related work until 1.8. Rushing it seems likely to mean we'll end up with something more confusing. We can backport the work internally for our use-case or bring it into a point release for 1.7 if people really want the feature.

Comment thread mesh/v1alpha1/proxy.proto

// debug enables trace output to stdout
// $hide_from_docs
bool debug = 3;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these hidden from docs but not the rest?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mostly use these for checking maximum performance impact & ensuring the tracer gets instantiated correctly. They are fairly detailed configuration options that I'd expect mostly istio developers who care about performance to examine.

This appears to be the same reasoning as the existing parameters in Stackdriver that are also hidden from the docs.

wdyt?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't even have an API for these options and see if ENV vars or other mechanism can be used for Istio devs. Having a hidden option in user facing API which is never going to be used to end-users seems odd to me.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That is fine for me. I can remove them from this tracer implementation. Do you see any reason to keep them for Stackdriver?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realize this would automerge when I ran the test, sorry about that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a stack driver expert so I would leave it up to @mandarjog or @douglas-reid to answer that question.

@nrjpoddar
Copy link
Copy Markdown
Member

LGTM as it's for 1.8 and we can refine it later. Please answer my question about hidden fields from docs before merging.

@brianwolfe
Copy link
Copy Markdown
Member Author

/test build_api

@istio-testing istio-testing merged commit 0bb7e74 into istio:master Aug 8, 2020
airbnb-gerrit pushed a commit to airbnb/istio-api that referenced this pull request Aug 12, 2020
These configurations were added to be consistent with Stackdriver and
provide extra performance tuning for OpenCensus. These are unlikely to
be used in production. Tuning should probably be performed by Istio
developers instead. We can remove the options for now.

This was triggered by a conversation at the end of the previous CL
merge: istio#1563
Change-Id: Iab148ff87dfe5b1772d3cdf2a009ec9cf4ea0f27
Reviewed-on: https://gerrit.musta.ch/c/public/istio-api/+/596
Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
Reviewed-by: Ying Zhu <ying.zhu@airbnb.com>
istio-testing pushed a commit that referenced this pull request Aug 19, 2020
These configurations were added to be consistent with Stackdriver and
provide extra performance tuning for OpenCensus. These are unlikely to
be used in production. Tuning should probably be performed by Istio
developers instead. We can remove the options for now.

This was triggered by a conversation at the end of the previous CL
merge: #1563
Change-Id: Iab148ff87dfe5b1772d3cdf2a009ec9cf4ea0f27
Reviewed-on: https://gerrit.musta.ch/c/public/istio-api/+/596
Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
Reviewed-by: Ying Zhu <ying.zhu@airbnb.com>
airbnb-gerrit pushed a commit to airbnb/istio that referenced this pull request Sep 2, 2020
Add support for OpenCensus tracing. This is similar to the existing
Stackdriver support. This is tested by adding a new tracing bootstrap
golden test comparison and by individually testing the configuration
options, similar to the existing tracers.

This implements the API from this PR:
istio/api#1563

Change-Id: I65d87c6c5e3e52804b06f6b3d0a35f9515e43c7e
Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/559
Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
airbnb-gerrit pushed a commit to airbnb/istio that referenced this pull request Sep 15, 2020
Add support for OpenCensus tracing. This is similar to the existing
Stackdriver support. This is tested by adding a new tracing bootstrap
golden test comparison and by individually testing the configuration
options, similar to the existing tracers.

This implements the API from this PR:
istio/api#1563

Change-Id: I65d87c6c5e3e52804b06f6b3d0a35f9515e43c7e
Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/559
Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>
istio-testing pushed a commit to istio/istio that referenced this pull request Sep 16, 2020
* tracing: add support for OpenCensus

Add support for OpenCensus tracing. This is similar to the existing
Stackdriver support. This is tested by adding a new tracing bootstrap
golden test comparison and by individually testing the configuration
options, similar to the existing tracers.

This implements the API from this PR:
istio/api#1563

Change-Id: I65d87c6c5e3e52804b06f6b3d0a35f9515e43c7e
Reviewed-on: https://gerrit.musta.ch/c/public/istio/+/559
Reviewed-by: Jungho Ahn <jungho.ahn@airbnb.com>

* release-notes: add note for opencensusagent

* tests/integration: add opencensusagent trace test

This test adds integration support for opencensusagent, sharing most of
the work from the existing zipkin integration tests, this exports traces
to an opentelemetry-collector instance, which forwards the traces to
zipkin. This same setup should be usable when envoy supports
opentelemetry tracing natively.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants