Consistent aggregation for rate metrics

### A note for the community


* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment



### Use Cases

Right now our metrics pipeline consists of the various DogStatsD client libraries, The Datadog Agent, and then straight to Datadogs backend.

We would like to implement Vector as an aggregation layer due to growing metric costs from services with a large number of host tags so we can effectively aggregate those tags (replacing with some type of vector instance id tag or something) away without risking data loss from Datadog.

The problem is that when using DogStatsD, all metrics which are submitted in app as a count are converted to a rate within the agents processes when it flushes the data together. These rate metrics are interpreted by Vector as a count metric with the `interval_ms` field set to the value that the agent set the interval to and when it reaches the Datadog Sink, it submits it as a rate.

All of that works fine until you try to aggregate the rate metrics. The way Vector seems to handle aggregation of multiple points is by summing the total of all the count values and setting the `interval_ms` to the total interval of all points included in the window. This results in inconsistency within a given timeseries of what the `interval_ms` value is. Datadog is unable to handle inconsistency and is designed to assume all datapoints for a given metric name always have the same interval (though you can change what it is in the backend).

The inconsistency arises from the fact that hosts will have offset between submissions from eachother, and because the Datadog agent has an unconfigurable 10s flush and 15s report interval, each host alternates between sending 1 or 2 datapoints per time series. We also need to ensure no two time series can be sent with the same timestamp else datadog will drop them

### Attempted Solutions

We have tried a number of things that somewhat work for other metric types but not rates.

Changing the timestamp to the vector clock time does help with misattribution and data duplication for non rate types, but for rates you still have inconsistency in the interval reported.

We have also tried modifying the `interval_ms` field with VRL but it seems this field is not allowed to be edited as it is not in the VRL object model or something like that.

### Proposal

There are a few things that would allow us to get the outcome we want:

1.  A setting on the Aggregate transform to always assign `interval_ms` to the same value as the transforms parameters
2. The ability to modify this field via VRL
3. A timestamp based bucketing transform for aggregation which also allows a window to stay open for some set duration (For example, aggregate events based on their timestamp in a 30s interval but hold the window open for 60s)
4. The same as the above as a standalone transform similar to `window` but purely time based might help too.

I am curious what workaround might already exist or if people have dealt with aggregating rate metrics from DogStatsD before, or if there is some other pattern that is recommended to accomplish the same goal.


### References

_No response_

### Version

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistent aggregation for rate metrics #23183

A note for the community

Use Cases

Attempted Solutions

Proposal

References

Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consistent aggregation for rate metrics #23183

Description

A note for the community

Use Cases

Attempted Solutions

Proposal

References

Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions