Skip to content
This repository was archived by the owner on Sep 17, 2025. It is now read-only.

Don't trace if it's a http request made by the exporter#289

Merged
c24t merged 25 commits intocensus-instrumentation:masterfrom
geobeau:fix-recursion
Nov 7, 2018
Merged

Don't trace if it's a http request made by the exporter#289
c24t merged 25 commits intocensus-instrumentation:masterfrom
geobeau:fix-recursion

Conversation

@geobeau
Copy link
Copy Markdown
Contributor

@geobeau geobeau commented Aug 29, 2018

So the issue is that when we trace httplib we also trace calls made by the exporter which make an infinite loop. The issue was reported in #135. This fix should prevent tracing when we see the same host for the request.

_tracer = execution_context.get_opencensus_tracer()
try: # pragma: NO COVER
# Don't trace if it's a http request made by the exporter
if self._dns_host == _tracer.exporter.host_name:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we can make the mechanism more generic and allow httplib to accept a list of paths + hosts to not be traced. I think this is reasonable request.

@lmolkova @SergeyKanzhelev what do you think?

Copy link
Copy Markdown
Contributor Author

@geobeau geobeau Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. To note: the request integration has the same issues, so it would need something similar (I can implement it as well in the same PR as soon as we agree on this part)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any leads on how to implement that? I don't think integrations are currently using options/parameters, so I'm not sure what would be the best way to do it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a blacklist url check method here: https://github.com/census-instrumentation/opencensus-python/blob/master/opencensus/trace/ext/utils.py#L40

And in Flask and Django integration, we allow users to provide a list of blacklisted URLs, and we will disable tracing for the blacklisted ones. Probably in here we could reuse that method.

In the exporters we actually didn't define exporter.host_name in the base class, could you add the param if we need it for every exporter?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

host_name is 'localhost' for several exporters by default: zipkin, jaeger, proto. Tracing calls to localhost could be legit and useful and we should prevent tracking based on something more unique like 'host:port'.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder how this list should be shared between different instrumentations: requests, httplib, grpc and who is responsible for configuring it?

My assumption: each exporter should register endpoints in this list. Otherwise users are responsible for finding out all endpoints for exporters they use and registering them in all relevant instrumentations.

@geobeau
Copy link
Copy Markdown
Contributor Author

geobeau commented Sep 21, 2018

Open for another review. I changed it to be more configurable and expanded it to requests.

return False


def disable_tracing_hostname(url, blacklist_hostnames=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be combined with the disable_tracing_url function in this file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. They both disable tracing but one is working on paths (blacklist path starting with an element from the list) and the other one with hostname (blacklist hostname/port exactly matching) with different default value. I think it's nice to have them separated for readability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liyanhui1228 Do you stand on your position on this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to separate them :)

@geobeau
Copy link
Copy Markdown
Contributor Author

geobeau commented Oct 18, 2018

Anything blocking merge ? @liyanhui1228 @reyang

@c24t c24t self-assigned this Oct 30, 2018
@c24t
Copy link
Copy Markdown
Member

c24t commented Oct 30, 2018

Hey @geobeau, I'm working with @liyanhui1228. Unless @bogdandrutu or @lmolkova have any more comments on the diff it LGTM (pending picking up recent changes on master), and I can merge it.

@geobeau
Copy link
Copy Markdown
Contributor Author

geobeau commented Oct 31, 2018

Cool, thanks @c24t !

@c24t
Copy link
Copy Markdown
Member

c24t commented Nov 2, 2018

@geobeau there was a failing test after picking up master. Let me know if the fix in 0a30edd looks kosher to you and I'll merge it if so.

@geobeau
Copy link
Copy Markdown
Contributor Author

geobeau commented Nov 2, 2018

Looks good to me

@c24t
Copy link
Copy Markdown
Member

c24t commented Nov 2, 2018

See #382 to fix CI timeouts.

@c24t
Copy link
Copy Markdown
Member

c24t commented Nov 3, 2018

It looks like async transport worker threads created in (at least) test_stackdriver_stats.py are each blocking for a few seconds as the test process exits, causing it to hang for long enough for CI to fail.

We're likely seeing this now because I increased the export interval in #354 -- lowering the export interval or grace period fixes the long wait time for the threads to exit. I don't know why we're only timing out on this PR, and not the others since #354.

Ideally we should be cleaning these threads up as the tests that create them exit, and I think the transport should flush immediately on exit instead of waiting (up to the duration of the grace period) for the next scheduled export. But this is a problem for another PR.

@geobeau
Copy link
Copy Markdown
Contributor Author

geobeau commented Nov 7, 2018

Thanks for the fix!

@c24t c24t merged commit 7dd2a44 into census-instrumentation:master Nov 7, 2018
@c24t c24t mentioned this pull request May 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants