-
Notifications
You must be signed in to change notification settings - Fork 2k
feat(sink): add opentelemetry metrics support #22550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
brittonhayes
wants to merge
16
commits into
vectordotdev:master
Choose a base branch
from
brittonhayes:feat/opentelemetry-metrics-sink
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
d96bb86
feat(new sink): added opentelemetry metrics sink
brittonhayes 1654d8b
fix: cargo fmt otel metrics
brittonhayes 8707f8c
fix: impl can be derived for otel metrics encoding conf
brittonhayes 92c65c6
chore(docs): added changelog entry for otel metrics sink
brittonhayes d188f32
chore(sink): separated otel metrics into config encoder and service f…
brittonhayes 4d9b9ec
fix: remove path from config mock tests
brittonhayes c7ced37
chore(sink): remove dedicated tests.rs file
brittonhayes 2f3ecae
feat(opentelemetry): using opentelemetry sdk and consolidate to one sink
brittonhayes e643832
fix(fmt): cargo fmt opentelemetry changes
brittonhayes d1ef748
fix(docs): remove incorrect doc comment
brittonhayes 07b2c0e
fix(input): specify metrics input
brittonhayes f59fd64
Update changelog.d/opentelemetry_metrics_sink.feature.md
brittonhayes 8c38e92
feat(opentelemetry): small code quality changes
brittonhayes e8de8da
Merge remote-tracking branch 'origin/master' into feat/opentelemetry-…
pront 56ebf01
nit
pront 3d000b0
update 3rd party licenses
pront File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| Added support for sendings metrics via the OpenTelemetry sink to OpenTelemetry collectors | ||
|
|
||
| authors: brittonhayes | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,201 @@ | ||
| use crate::event::metric::{Metric as VectorMetric, MetricValue}; | ||
| use std::task::{Context, Poll}; | ||
| use vector_config::configurable_component; | ||
|
|
||
| use futures::future::{self, BoxFuture}; | ||
| use http::StatusCode; | ||
| use hyper::Body; | ||
| use tower::Service; | ||
| use tracing::{debug, trace}; | ||
| use vector_lib::event::EventStatus; | ||
|
|
||
| use opentelemetry::metrics::{Meter, MeterProvider}; | ||
| use opentelemetry::KeyValue; | ||
| use opentelemetry_otlp::{MetricExporter, WithExportConfig}; | ||
| use opentelemetry_sdk::metrics::{SdkMeterProvider, Temporality}; | ||
|
|
||
| use crate::event::Event; | ||
| use crate::sinks::util::PartitionInnerBuffer; | ||
| use futures_util::stream::BoxStream; | ||
| use vector_lib::sink::StreamSink; | ||
|
|
||
| /// The aggregation temporality to use for metrics. | ||
| #[configurable_component] | ||
| #[derive(Clone, Copy, Debug)] | ||
| #[serde(rename_all = "snake_case")] | ||
| pub enum AggregationTemporality { | ||
| /// Delta temporality means that metrics are reported as changes since the last report. | ||
| Delta, | ||
| /// Cumulative temporality means that metrics are reported as cumulative changes since a fixed start time. | ||
| Cumulative, | ||
| } | ||
|
|
||
| impl Default for AggregationTemporality { | ||
| fn default() -> Self { | ||
| Self::Cumulative | ||
| } | ||
| } | ||
|
|
||
| // Add conversion from AggregationTemporality to the OpenTelemetry SDK's Temporality | ||
| impl From<AggregationTemporality> for Temporality { | ||
| fn from(temporality: AggregationTemporality) -> Self { | ||
| match temporality { | ||
| AggregationTemporality::Delta => Temporality::Delta, | ||
| AggregationTemporality::Cumulative => Temporality::Cumulative, | ||
| } | ||
| } | ||
| } | ||
|
|
||
| #[derive(Default)] | ||
| pub struct OpentelemetryMetricNormalize; | ||
|
|
||
| // Implementation using the OpenTelemetry SDK | ||
| pub struct OpentelemetryMetricsSvc { | ||
| meter_provider: SdkMeterProvider, | ||
| meter: Meter, | ||
| namespace: String, | ||
| } | ||
|
|
||
| impl OpentelemetryMetricsSvc { | ||
| pub fn new( | ||
| namespace: String, | ||
| endpoint: String, | ||
| temporality: AggregationTemporality, | ||
| ) -> crate::Result<Self> { | ||
| // Create the exporter | ||
| let exporter = MetricExporter::builder() | ||
| .with_http() | ||
| .with_endpoint(endpoint) | ||
| .with_temporality(Temporality::from(temporality)) | ||
| .build() | ||
| .map_err(|e| crate::Error::from(format!("Failed to build metrics exporter: {}", e)))?; | ||
|
|
||
| // Create the meter provider with the exporter | ||
| let provider = SdkMeterProvider::builder() | ||
| .with_periodic_exporter(exporter) | ||
| .build(); | ||
|
|
||
| let meter = provider.meter("vector"); | ||
|
|
||
| Ok(Self { | ||
| meter_provider: provider, | ||
| meter, | ||
| namespace, | ||
| }) | ||
| } | ||
|
|
||
| // Convert and record Vector metrics using the OpenTelemetry SDK | ||
| fn convert_and_record_metrics(&self, events: Vec<VectorMetric>) { | ||
| for event in events { | ||
| let metric_name = event.name().to_string(); | ||
| let attributes = event | ||
| .tags() | ||
| .map(|tags| { | ||
| tags.iter_single() | ||
| .map(|(k, v)| KeyValue::new(k.to_string(), v.to_string())) | ||
| .collect::<Vec<_>>() | ||
| }) | ||
| .unwrap_or_default(); | ||
|
|
||
| // Add the service.name attribute with the namespace | ||
| let mut all_attributes = vec![KeyValue::new("service.name", self.namespace.clone())]; | ||
| all_attributes.extend(attributes); | ||
|
|
||
| match event.value() { | ||
| MetricValue::Counter { value } => { | ||
| let counter = self.meter.f64_counter(metric_name).build(); | ||
| counter.add(*value, &all_attributes); | ||
| } | ||
| MetricValue::Gauge { value } => { | ||
| // For gauges, we use a counter since observable gauges require callbacks | ||
| let counter = self | ||
| .meter | ||
| .f64_counter(format!("{}_gauge", metric_name)) | ||
| .build(); | ||
| counter.add(*value, &all_attributes); | ||
| } | ||
| MetricValue::Distribution { samples, .. } => { | ||
| let histogram = self.meter.f64_histogram(metric_name).build(); | ||
| for sample in samples { | ||
| // Record each sample with its rate | ||
| for _ in 0..sample.rate { | ||
| histogram.record(sample.value, &all_attributes); | ||
| } | ||
| } | ||
| } | ||
| MetricValue::Set { values } => { | ||
| // For sets, we record the count of unique values | ||
| let counter = self | ||
| .meter | ||
| .f64_counter(format!("{}_set", metric_name)) | ||
| .build(); | ||
| counter.add(values.len() as f64, &all_attributes); | ||
| } | ||
| _ => {} | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl Service<PartitionInnerBuffer<Vec<VectorMetric>, String>> for OpentelemetryMetricsSvc { | ||
| type Response = http::Response<Body>; | ||
| type Error = crate::Error; | ||
| type Future = BoxFuture<'static, Result<Self::Response, Self::Error>>; | ||
|
|
||
| fn poll_ready(&mut self, _cx: &mut Context) -> Poll<Result<(), Self::Error>> { | ||
| Poll::Ready(Ok(())) | ||
| } | ||
|
|
||
| fn call(&mut self, items: PartitionInnerBuffer<Vec<VectorMetric>, String>) -> Self::Future { | ||
| let (metrics, _namespace) = items.into_parts(); | ||
|
|
||
| // Convert and record metrics | ||
| self.convert_and_record_metrics(metrics); | ||
|
|
||
| // The SDK handles the export asynchronously, so we just return a success response | ||
| Box::pin(future::ok( | ||
| http::Response::builder() | ||
| .status(StatusCode::OK) | ||
| .body(Body::empty()) | ||
| .unwrap(), | ||
| )) | ||
| } | ||
| } | ||
|
|
||
| impl Drop for OpentelemetryMetricsSvc { | ||
| fn drop(&mut self) { | ||
| // Ensure metrics are exported before shutting down | ||
| if let Err(err) = self.meter_provider.shutdown() { | ||
| error!("Error shutting down meter provider: {:?}", err); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| #[async_trait::async_trait] | ||
| impl StreamSink<Event> for OpentelemetryMetricsSvc { | ||
| async fn run(mut self: Box<Self>, mut input: BoxStream<'_, Event>) -> Result<(), ()> { | ||
| use futures::StreamExt; | ||
|
|
||
| debug!("OpenTelemetry metrics sink started"); | ||
|
|
||
| while let Some(mut event) = input.next().await { | ||
| // Extract finalizers before processing | ||
| let finalizers = event.metadata_mut().take_finalizers(); | ||
|
|
||
| // Extract metrics from the event | ||
| if let Event::Metric(metric) = event { | ||
| trace!("Processing metric event: {}", metric.name()); | ||
| // Process the metric | ||
| self.convert_and_record_metrics(vec![metric]); | ||
| } else { | ||
| trace!("Ignoring non-metric event"); | ||
| } | ||
|
|
||
| // Finalize the event with success status | ||
| finalizers.update_status(EventStatus::Delivered); | ||
| } | ||
|
|
||
| debug!("OpenTelemetry metrics sink stopped"); | ||
| Ok(()) | ||
| } | ||
| } | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this desirable behaviour for vector? Seems like you loose end to end acks with this. Would it be better to handle the sending in vector itself and only use the SDK for constructing and serializing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really good point. I'd like to keep acks and stay consistent with other sinks in terms of shipping flow. Will refine to just use the sdk for serializing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this an initial try and it might take some additional work to sort out.
I believe this would require us to create an OTLP metric exporter that integrates with Vector's built in buffer/acknowledgements/retry functionality
Right now we're making an exporter using their builder method, but we could likely make a dedicated exporter without the builder that is specific to Vector.
Here's what we'd have to support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additional challenge, this is not exported outside the otlp crate, meaning its not quite this easy to make our own exporter metrics client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does using the SDK provide a lot of value? I'm not sure its usecase aligns with vector that well. Trying to integrate it might be more hassle than its worth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly the more I use it the more it does feel like it's adding more hassle than help.
The benefits of the sdk is built in aggregation of metrics though which is helpful. Other than that, it seems to add a lot of abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on the logging support i came to the same conclusion. Let me see if I can get my PR open as draft today, maybe we can converge on the same direction for it. I used the gcp stackdriver sink as a starting point for a sink that allows for custom encoding. Not sure if its right direction, adapting the http sink could also work, but maybe we can avoid some double work