Skip to content

Optionally handle DELTA metrics as counters#167

Closed
osabina wants to merge 3 commits intoprometheus-community:masterfrom
osabina:to_upstream
Closed

Optionally handle DELTA metrics as counters#167
osabina wants to merge 3 commits intoprometheus-community:masterfrom
osabina:to_upstream

Conversation

@osabina
Copy link

@osabina osabina commented Aug 10, 2022

I've cleaned up the internal set of commits we are (successfully, as far as we can determine) running as of this week (8/8/2022) and it seemed to make sense to drop this as a PR instead of a random branch floating out there.

Open to whatever cleanup/changes/suggestions folks have in order to get this merged into the mainline.

Ozzie Sabina and others added 3 commits August 10, 2022 18:33
…an gauges.

This is accomplished by caching the "current" running value in this case
and exporting that value as a counter.

The helper code for this is in the collector namespace (I am using the
fnv code and followed a similar pattern in adding it here).

This is flagged as --aggregate-deltas.
This is done similarly to the DELTA INT64/DOUBLE code, with the major
change being that we need to cache all the histogram buckets instead of
a single value.  It leans on existing code as much as possible.

This is controlled by the same --aggregate-deltas flag.
Once this was rolled out to production, we pretty quickly realized we
were doing (unallowed) concurrent map access.

We're kind of using a sledgehammer here with a global mutex across each type of
cache.  But it's also the simplest fix if it doesn't cause too much lock contention.
@kgeckhart
Copy link
Contributor

kgeckhart commented Aug 16, 2022

I figured it might be easier to move this convo to the PR. Related to question 3 from here, I think this might be an issue for slower moving metrics. Everything works really well when the metric is constantly changing which is awesome! Here's an example with a slower moving metric,
image
Top graph is a 5m rate calculation and bottom is the raw counter. Seems as though GCP will produce a 0 for a delta for ~5 minutes before dropping the metric. When the metric starts reporting again the rate doesn't start reporting because of the gap in data. The rate only starts reporting on the second change which probably isn't ideal.

I'm not quite sure how to resolve this without going to the solution which always reports the counters even if GCP drops it for a longer period of time. Happy to try to merge your work with mine which attempts to solve the problem this way. I'm currently attempting to fix an issue with the implementation which moves the reported time too far in to the future.

@osabina
Copy link
Author

osabina commented Aug 18, 2022

@kgeckhart thanks for the excellent explanation of the problem -- we must only be tracking things that are getting reasonable traffic since everyone seems happy (so far) with what we have. But, certainly If you can integrate a solution to this, that would be awesome!

Perhaps have a flag that sets the TTL for continuing to export metrics no longer showing up, default 0s -- then folks could adjust for their needs?

SuperQ pushed a commit that referenced this pull request Nov 22, 2022
This PR introduces support for aggregating DELTA metrics in-memory. This implementation is somewhat based on #167 but with extra functionality which is intended to solve the problem identified here, #167 (comment).

I chose to inject interfaces for the delta and distribution stores at the Collector level to try to keep library friendliness intact, #157

I did as much as I could to explain the solution and the tradeoffs with aggregating deltas in this manner in the README.md. The bulk of the changes are in `monitoring_metrics.go` where I tried to obfuscate as much of the delta logic as possible.

Should hopefully resolve #116 and #25 

Signed-off-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>
@osabina
Copy link
Author

osabina commented Feb 9, 2023

Sorry meant to close this since @kgeckhart got this fix into the mainline already!

@osabina osabina closed this Feb 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants