version=0.3.1
At the moment, even though Stackdriver is initialized with its default async transport:
|
class StackdriverStatsExporter(base.StatsExporter): |
|
"""Stats exporter for the Stackdriver Monitoring backend.""" |
|
|
|
def __init__(self, |
|
options=Options(), |
|
client=None, |
|
default_labels={}, |
|
transport=async_.AsyncTransport): |
|
self._options = options |
|
self._client = client |
|
self._transport = transport(self) |
|
self._default_labels = default_labels |
the view_manager.register_view is synchronous and is not resilient to errors. When stackdriver was momentarily down, our service failed to initialize with the following exception:
Traceback (most recent call last):
File "/home/dropcam/labs/python/lib/python2.7/site-packages/opencensus/stats/view_manager.py", line 37, in register_view
self.measure_to_view_map.register_view(view=view, timestamp=self.time)
File "/home/dropcam/labs/python/lib/python2.7/site-packages/opencensus/stats/measure_to_view_map.py", line 80, in register_view
e.on_register_view(view)
File "/home/dropcam/labs/python/lib/python2.7/site-packages/opencensus/stats/exporters/stackdriver_exporter.py", line 148, in on_register_view
self.create_metric_descriptor(view)
File "/home/dropcam/labs/python/lib/python2.7/site-packages/opencensus/stats/exporters/stackdriver_exporter.py", line 335, in create_metric_descriptor
descriptor = client.create_metric_descriptor(project_name, descriptor)
File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/cloud/monitoring_v3/gapic/metric_service_client.py", line 622, in create_metric_descriptor
request, retry=retry, timeout=timeout, metadata=metadata
File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/api_core/gapic_v1/method.py", line 143, in __call__
return wrapped_func(*args, **kwargs)
File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/api_core/retry.py", line 270, in retry_wrapped_func
on_error=on_error,
File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/api_core/retry.py", line 179, in retry_target
return target()
File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/api_core/timeout.py", line 214, in func_with_timeout
return func(*args, **kwargs)
File "/home/dropcam/labs/python/lib/python2.7/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "/home/dropcam/labs/python/lib/python2.7/site-packages/six.py", line 737, in raise_from
raise value
google.api_core.exceptions.ServiceUnavailable: 503 Deadline Exceeded
Similar to the Java version and the python create time series export(#297) , I'd expect the register_views call to not block and to suppress all errors with a log statement to record it happened.
https://github.com/census-instrumentation/opencensus-java/blob/538d77e4eeb18df592b04f794a400e981bb1b649/exporters/stats/stackdriver/src/main/java/io/opencensus/exporter/stats/stackdriver/CreateMetricDescriptorExporter.java#L129-L149
The java stackdriver exporter maintains a map of all seen descriptors and will attempt to create a descriptor only when it receives a create time series request for a new descriptor. I.e. it doesn't create the descriptor on the register view call. This has several advantages:
- Cheap initialization
- Smoother request rate across the fleet (it's more randomized as it happens on the first request/instance of an event rather than at service boot time)
We have thousands of servers and there are situations (AZ failures, bad kernel update, etc) we have to restart a couple hundred at once. This is a risk here that we blow our create metric descriptor quota and fail to initialize opencensus appropriately, making points 1 and 2 important
- Easier in large code bases. Right now, my understanding is that all exporters need to be initialized before any view is registered. It can become a bit painful to ensure the setup procedure is strictly followed.
- Alignment with opencensus-java reducing the required context for teams operating in a multi-service/multi-language world.
version=0.3.1
At the moment, even though Stackdriver is initialized with its default async transport:
opencensus-python/opencensus/stats/exporters/stackdriver_exporter.py
Lines 113 to 124 in bebc508
the
view_manager.register_viewis synchronous and is not resilient to errors. When stackdriver was momentarily down, our service failed to initialize with the following exception:Similar to the Java version and the python create time series export(#297) , I'd expect the register_views call to not block and to suppress all errors with a log statement to record it happened.
https://github.com/census-instrumentation/opencensus-java/blob/538d77e4eeb18df592b04f794a400e981bb1b649/exporters/stats/stackdriver/src/main/java/io/opencensus/exporter/stats/stackdriver/CreateMetricDescriptorExporter.java#L129-L149
The java stackdriver exporter maintains a map of all seen descriptors and will attempt to create a descriptor only when it receives a create time series request for a new descriptor. I.e. it doesn't create the descriptor on the register view call. This has several advantages:
We have thousands of servers and there are situations (AZ failures, bad kernel update, etc) we have to restart a couple hundred at once. This is a risk here that we blow our create metric descriptor quota and fail to initialize opencensus appropriately, making points 1 and 2 important