Bigger chunks, faster queries.#1048
Conversation
a812491 to
b448aa6
Compare
4cddb9a to
bac7bbc
Compare
|
If |
I guess you want this so back porting fixes from Prometheus v1 is easier? Its not really something I'm worried about TBH - there is no development work happening on Prometheus v1, we're more likely to find an fix issues in this package ourself.
I started with in in another package, but as I've modelling this as another chunk encoding, there needed to be a bunch of references to it in the chunk package to make it work with the rest of our code. Therefore I just moved it into this package. Given that, I was thinking of renaming the directory to |
Also so we can see where bugs came from. I'm going to have to think about this one. "Prom v1 verbatim" is pretty easy to justify; "Prom v2" similarly, but "Prom v2 data wedged into the code structure from v1" will take time. |
|
If every chunk in a long-running series is ~12 hours, then every second chunk is indexed in two day-buckets, so we have 50% extra index entries. I'm beginning to think that day buckets are too short. |
|
Looking at the series store - if we have 4 hour chunks, we write 7x3=21
series entries per day. With this we have 9 entries per day. So we write
half the number of entries - 9 vs 21. The number of label entries stays the
same - let’s say 10.
For the chunk store - 4 hour chunks would have 21x10 entries. Big chunks
would have 9x10.
So sure it’s 50% when compared with itself, but when compared to short
chunks i think the amount of extra overlaps you write remains the same -
basically one per day. There is after only one start and end to each day.
…On Fri, 5 Oct 2018 at 10:59, Bryan Boreham ***@***.***> wrote:
If every chunk in a long-running series is ~12 hours, then every second
chunk is indexed in two day-buckets, so we have 50% extra index entries.
I'm beginning to think that day buckets are too short.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1048 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbGhcly3BqXiummvddoYtAMOCkE9uvaks5uhx9mgaJpZM4XDuPp>
.
|
|
My comment was not intended as a criticism of bigger chunks, but a broader comment on the DB design. A 50% overhead is worth talking about no mater what your starting-point is. Another way to go would be to stop the double index writes, and look up the index for one extra bucket on queries. |
d0eb6ba to
b8c25f8
Compare
Lets take this into a separate issue; agree its a valid concern, but its unrelated to this PR. |
Also, remove assumptions about marshalled chunk length. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
… to reduce amount of iteration we have to do through the chunk to find the right place. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
…memory. Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
e474c2a to
7958330
Compare
|
I've moved the chunk code into |
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
A bigchunk is a slice of of prometheus/tsdb chunks. Each individual chunk is fixed at 120 samples; after that we add a new one. There is no upperbound on the number of samples in a bigchunk.
As part of this PR I've removed a bunch of unused code in the chunk package.
Fixes #1045, fixes #766, fixes #300