APMAPI-1306 feat(otel): add support for otel metrics api via protobuf and json#6783
APMAPI-1306 feat(otel): add support for otel metrics api via protobuf and json#6783
Conversation
- Add metrics.proto and metrics_service.proto (OTLP v1 spec) - Update protobuf_loader to support metrics protos - Rename protos/ -> otlp/ directory for better organization
- Create OtlpHttpExporterBase for shared HTTP export logic - Create OtlpTransformerBase for shared transformation logic - Refactor logs exporter/transformer to extend base classes - Update test mocking paths - Eliminates ~400 lines of duplication
79e1be4 to
ba3f4b1
Compare
BridgeAR
left a comment
There was a problem hiding this comment.
LGTM with a few nits and potential follow-ups :)
| const histogram = decoded.resourceMetrics[0].scopeMetrics[0].metrics[0] | ||
| assert.strictEqual(histogram.name, 'duration') | ||
| assert.strictEqual(histogram.histogram.dataPoints[0].count, 1) | ||
| assert.strictEqual(histogram.histogram.dataPoints[0].sum, 100) |
There was a problem hiding this comment.
Nit: I think checking for all properties of the histogram would be good.
Nit: Ideally, we use our assertObjectContains helper instead of having many individual assertions.
| setTimeout(() => { | ||
| assert(validated, 'Should have validated an export with all metrics') | ||
| validator() | ||
| done() | ||
| }, 150) |
There was a problem hiding this comment.
Instead of using a timeout, it would be possible to just call done() from inside of the validator method. That way it is simpler and faster.
The same applies for all other methods with that pattern.
It is also more reliable, due to not being dependent on the timing anymore.
Or even better: the mockOtlpExport method returns a promise. That way you can trigger the methods sync and just return that promise afterwards. That would be very clean.
| assert(counter, 'Should have counter') | ||
| assert(histogram, 'Should have histogram') | ||
| assert.strictEqual(counter.name, 'test') | ||
| assert.strictEqual(histogram.name, 'test') | ||
| assert.strictEqual(counter.sum.dataPoints.length, 1, 'Counter should have 1 data point') | ||
| assert.strictEqual(histogram.histogram.dataPoints.length, 1, 'Histogram should have 1 data point') | ||
| assert.strictEqual(counter.sum.dataPoints[0].asInt, 5) | ||
| assert.strictEqual(histogram.histogram.dataPoints[0].sum, 100) |
There was a problem hiding this comment.
I know we have lots of code like that, while I would like to start prevent calling so many asserts on a single object. We should just use the partial helper (assertObjectContains, if I am not mistaken)
| let bucketIndex = DEFAULT_HISTOGRAM_BUCKETS.length | ||
| for (let i = 0; i < DEFAULT_HISTOGRAM_BUCKETS.length; i++) { | ||
| if (value <= DEFAULT_HISTOGRAM_BUCKETS[i]) { | ||
| bucketIndex = i | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
Ideally, this would use a binary search.
| if (!cumulativeState.has(stateKey)) { | ||
| cumulativeState.set(stateKey, { | ||
| count: 0, | ||
| sum: 0, | ||
| min: Infinity, | ||
| max: -Infinity, | ||
| bucketCounts: new Array(DEFAULT_HISTOGRAM_BUCKETS.length + 1).fill(0), | ||
| startTime: metric.temporality === TEMPORALITY.CUMULATIVE ? this.#startTime : timestamp | ||
| }) | ||
| } | ||
|
|
||
| const state = cumulativeState.get(stateKey) |
There was a problem hiding this comment.
Nit
Ideally, it would just get the state if it was defined. If it was not defined, we could combine the handling to just set the first value.
That would be a tad faster.
| const state = cumulativeState.get(stateKey) | ||
| state.value += value | ||
|
|
||
| const dataPoint = this.#findOrCreateDataPoint(metric, attributes, attrKey, () => ({ |
There was a problem hiding this comment.
I believe the idea around the method is an performance optimization to prevent the object from being created?
Instead of an object, we are now creating a method that is referencing these parts. I did not benchmark something like that before, I would expect the function creation still to take more time though. The pattern is only useful in case there would be many compute heavy operations.
| sum: 0, | ||
| min: Infinity, | ||
| max: -Infinity, | ||
| bucketCounts: new Array(DEFAULT_HISTOGRAM_BUCKETS.length + 1).fill(0), |
There was a problem hiding this comment.
This is actually a compute heavy operations (referencing a comment above about findOrCreateDataPoint and the method vs object creation).
Making this lazy is a good idea. To prevent the overhead, I would still create a simple object and just use optional assignment to assign the bucketCounts after the call. That way dataPoint will be updated due to the reference and it is cheap.
…6783) * add otlp metrics protos * feat: add OTLP metrics proto definitions and reorganize directory - Add metrics.proto and metrics_service.proto (OTLP v1 spec) - Update protobuf_loader to support metrics protos - Rename protos/ -> otlp/ directory for better organization * refactor: extract common OTLP logic into base classes - Create OtlpHttpExporterBase for shared HTTP export logic - Create OtlpTransformerBase for shared transformation logic - Refactor logs exporter/transformer to extend base classes - Update test mocking paths - Eliminates ~400 lines of duplication * fix logs * increase test coverage * feat(metrics): add support for otel metrics provider * feat(metrics): add support for otlp configurations * updates to pass system tests * add temperolality support and clean up implementation * add support for encoding attributes * simplify tests * use real values in tests * do not encode numbers as strings * use enum in transformation * remove unneed fields * add better temporality support and include encoding for async metric types * improve test coverage for scope attributes * simplify stubs in tests * validate scope attributes * ruben comments * fix broken number test * clean up tests * clean up instruments and update tests * simplify meter * use constants and implement missing callbacks * move otlp_transformer changes over * clean up private/public fields * avoid redefining constant * clean up tests * update typing * linting clean ups * first round of changes from cr * limit number of metrics in each batch * round 3 changes * address more comments part 4 * update doc strings and typing * update max batch size to operate on the aggregate metrics * avoid converting aggregated metrics to arrays, perf * fix regression in logs implementation * revert closure * cleanup max measurement queue, jdocs and typing * log warning if value is invalid * final set of changes * update tests to be compatible with master branch * remove register from meter provider and simplify export * clean up stable stringify and update limit of max queue size * remove @Private from metrics docs * only implement the meterprovider api, renove shutdown and forceflush operations * remove observableInstruments * lint * fix linting failures * fix linting failures * fix linting 3
…6783) * add otlp metrics protos * feat: add OTLP metrics proto definitions and reorganize directory - Add metrics.proto and metrics_service.proto (OTLP v1 spec) - Update protobuf_loader to support metrics protos - Rename protos/ -> otlp/ directory for better organization * refactor: extract common OTLP logic into base classes - Create OtlpHttpExporterBase for shared HTTP export logic - Create OtlpTransformerBase for shared transformation logic - Refactor logs exporter/transformer to extend base classes - Update test mocking paths - Eliminates ~400 lines of duplication * fix logs * increase test coverage * feat(metrics): add support for otel metrics provider * feat(metrics): add support for otlp configurations * updates to pass system tests * add temperolality support and clean up implementation * add support for encoding attributes * simplify tests * use real values in tests * do not encode numbers as strings * use enum in transformation * remove unneed fields * add better temporality support and include encoding for async metric types * improve test coverage for scope attributes * simplify stubs in tests * validate scope attributes * ruben comments * fix broken number test * clean up tests * clean up instruments and update tests * simplify meter * use constants and implement missing callbacks * move otlp_transformer changes over * clean up private/public fields * avoid redefining constant * clean up tests * update typing * linting clean ups * first round of changes from cr * limit number of metrics in each batch * round 3 changes * address more comments part 4 * update doc strings and typing * update max batch size to operate on the aggregate metrics * avoid converting aggregated metrics to arrays, perf * fix regression in logs implementation * revert closure * cleanup max measurement queue, jdocs and typing * log warning if value is invalid * final set of changes * update tests to be compatible with master branch * remove register from meter provider and simplify export * clean up stable stringify and update limit of max queue size * remove @Private from metrics docs * only implement the meterprovider api, renove shutdown and forceflush operations * remove observableInstruments * lint * fix linting failures * fix linting failures * fix linting 3
What does this PR do?
Adds full OpenTelemetry Metrics support to dd-trace-js with a custom Meter Provider implementation. Enable with
DD_METRICS_OTEL_ENABLED=trueto export metrics via OTLP protocol.Key Features:
http/protobuf(default) orhttp/jsonprotocolsOTEL_EXPORTER_OTLP_METRICS_*environment variablesConfiguration:
DD_METRICS_OTEL_ENABLED- Enable OpenTelemetry metrics (default:false)OTEL_EXPORTER_OTLP_METRICS_ENDPOINT- Endpoint URL (default:http://localhost:4318/v1/metrics)OTEL_EXPORTER_OTLP_METRICS_PROTOCOL- Protocol:http/protobuforhttp/jsonOTEL_METRIC_EXPORT_INTERVAL- Export interval in ms (default:60000)Motivation
Enables customers to use OpenTelemetry Metrics API with dd-trace-js without adding the OpenTelemetry SDK as a dependency. Custom implementation provides better integration with dd-trace-js configurations, avoids vendoring grpc libraries and maintains flexibility.
Additional Notes