Skip to content
This repository was archived by the owner on Sep 17, 2025. It is now read-only.
This repository was archived by the owner on Sep 17, 2025. It is now read-only.

Update register views to suppress errors and not block. #600

@prateekr

Description

@prateekr

version=0.3.1

At the moment, even though Stackdriver is initialized with its default async transport:

class StackdriverStatsExporter(base.StatsExporter):
"""Stats exporter for the Stackdriver Monitoring backend."""
def __init__(self,
options=Options(),
client=None,
default_labels={},
transport=async_.AsyncTransport):
self._options = options
self._client = client
self._transport = transport(self)
self._default_labels = default_labels

the view_manager.register_view is synchronous and is not resilient to errors. When stackdriver was momentarily down, our service failed to initialize with the following exception:

Traceback (most recent call last):
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/opencensus/stats/view_manager.py", line 37, in register_view
    self.measure_to_view_map.register_view(view=view, timestamp=self.time)
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/opencensus/stats/measure_to_view_map.py", line 80, in register_view
    e.on_register_view(view)
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/opencensus/stats/exporters/stackdriver_exporter.py", line 148, in on_register_view
    self.create_metric_descriptor(view)
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/opencensus/stats/exporters/stackdriver_exporter.py", line 335, in create_metric_descriptor
    descriptor = client.create_metric_descriptor(project_name, descriptor)
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/cloud/monitoring_v3/gapic/metric_service_client.py", line 622, in create_metric_descriptor
    request, retry=retry, timeout=timeout, metadata=metadata
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__
    return wrapped_func(*args, **kwargs)
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/api_core/retry.py", line 270, in retry_wrapped_func
    on_error=on_error,
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/api_core/retry.py", line 179, in retry_target
    return target()
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
    return func(*args, **kwargs)
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "/home/dropcam/labs/python/lib/python2.7/site-packages/six.py", line 737, in raise_from
    raise value
google.api_core.exceptions.ServiceUnavailable: 503 Deadline Exceeded

Similar to the Java version and the python create time series export(#297) , I'd expect the register_views call to not block and to suppress all errors with a log statement to record it happened.

https://github.com/census-instrumentation/opencensus-java/blob/538d77e4eeb18df592b04f794a400e981bb1b649/exporters/stats/stackdriver/src/main/java/io/opencensus/exporter/stats/stackdriver/CreateMetricDescriptorExporter.java#L129-L149
The java stackdriver exporter maintains a map of all seen descriptors and will attempt to create a descriptor only when it receives a create time series request for a new descriptor. I.e. it doesn't create the descriptor on the register view call. This has several advantages:

  1. Cheap initialization
  2. Smoother request rate across the fleet (it's more randomized as it happens on the first request/instance of an event rather than at service boot time)

We have thousands of servers and there are situations (AZ failures, bad kernel update, etc) we have to restart a couple hundred at once. This is a risk here that we blow our create metric descriptor quota and fail to initialize opencensus appropriately, making points 1 and 2 important

  1. Easier in large code bases. Right now, my understanding is that all exporters need to be initialized before any view is registered. It can become a bit painful to ensure the setup procedure is strictly followed.
  2. Alignment with opencensus-java reducing the required context for teams operating in a multi-service/multi-language world.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions