Skip to content

Avoid writing duplicate chunks by checking the cache first#1475

Merged
bboreham merged 1 commit intomasterfrom
dedupe-chunk-writes
Jun 26, 2019
Merged

Avoid writing duplicate chunks by checking the cache first#1475
bboreham merged 1 commit intomasterfrom
dedupe-chunk-writes

Conversation

@bboreham
Copy link
Contributor

In the case where two ingesters have chunks for the same series, with the same start and end times and same contents, this change will skip one of the writes, which saves effort, and money with DynamoDB.

How often does this happen? It depends on the type of timeseries data Cortex is handling. It is most likely for short chunks, e.g. cAdvisor metrics from containers that run just a few minutes. It is least likely for long-running series as they will be flushed at a time relative to the start of the ingester process, so the end-time is unlikely to match. (But I'm working on that via -ingester.spread-flushes).

I thought about adding a metric to count the chunks saved, but you can see this via the hit-rate of the cache inside ingesters.

Signed-off-by: Bryan Boreham <bryan@weave.works>
@bboreham bboreham merged commit b04f55d into master Jun 26, 2019
@tomwilkie tomwilkie deleted the dedupe-chunk-writes branch July 31, 2019 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants