Skip to content

Bigger chunks, faster queries.#1048

Merged
tomwilkie merged 11 commits intocortexproject:masterfrom
grafana:biggerchunk
Nov 19, 2018
Merged

Bigger chunks, faster queries.#1048
tomwilkie merged 11 commits intocortexproject:masterfrom
grafana:biggerchunk

Conversation

@tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented Oct 2, 2018

A bigchunk is a slice of of prometheus/tsdb chunks. Each individual chunk is fixed at 120 samples; after that we add a new one. There is no upperbound on the number of samples in a bigchunk.

As part of this PR I've removed a bunch of unused code in the chunk package.

Fixes #1045, fixes #766, fixes #300

@tomwilkie tomwilkie force-pushed the biggerchunk branch 4 times, most recently from a812491 to b448aa6 Compare October 2, 2018 16:43
@tomwilkie tomwilkie force-pushed the biggerchunk branch 3 times, most recently from 4cddb9a to bac7bbc Compare October 2, 2018 19:55
@tomwilkie tomwilkie changed the title [WIP] Bigchunks Bigger chunks, faster queries. Oct 2, 2018
@bboreham
Copy link
Contributor

bboreham commented Oct 3, 2018

If pkg/prom1/storage/local/chunk/bigchunk.go is dealing with prom2 chunks, can it go in a directory reflecting that?
prom1 should be, as far as possible, the code from Prometheus v1 copied verbatim.

@tomwilkie
Copy link
Contributor Author

prom1 should be, as far as possible, the code from Prometheus v1 copied verbatim.

I guess you want this so back porting fixes from Prometheus v1 is easier? Its not really something I'm worried about TBH - there is no development work happening on Prometheus v1, we're more likely to find an fix issues in this package ourself.

If pkg/prom1/storage/local/chunk/bigchunk.go is dealing with prom2 chunks, can it go in a directory reflecting that?

I started with in in another package, but as I've modelling this as another chunk encoding, there needed to be a bunch of references to it in the chunk package to make it work with the rest of our code. Therefore I just moved it into this package.

Given that, I was thinking of renaming the directory to chunkenc to remove the name clashes with out chunk package. WDYT?

@bboreham
Copy link
Contributor

bboreham commented Oct 3, 2018

I guess you want this so back porting fixes from Prometheus v1 is easier?

Also so we can see where bugs came from.

I'm going to have to think about this one. "Prom v1 verbatim" is pretty easy to justify; "Prom v2" similarly, but "Prom v2 data wedged into the code structure from v1" will take time.

@bboreham
Copy link
Contributor

bboreham commented Oct 5, 2018

If every chunk in a long-running series is ~12 hours, then every second chunk is indexed in two day-buckets, so we have 50% extra index entries. I'm beginning to think that day buckets are too short.

@tomwilkie
Copy link
Contributor Author

tomwilkie commented Oct 5, 2018 via email

@bboreham
Copy link
Contributor

bboreham commented Oct 5, 2018

My comment was not intended as a criticism of bigger chunks, but a broader comment on the DB design. A 50% overhead is worth talking about no mater what your starting-point is.

Another way to go would be to stop the double index writes, and look up the index for one extra bucket on queries.

@tomwilkie tomwilkie force-pushed the biggerchunk branch 2 times, most recently from d0eb6ba to b8c25f8 Compare October 18, 2018 14:21
@tomwilkie
Copy link
Contributor Author

My comment was not intended as a criticism of bigger chunks, but a broader comment on the DB design. A 50% overhead is worth talking about no mater what your starting-point is.

Lets take this into a separate issue; agree its a valid concern, but its unrelated to this PR.

Also, remove assumptions about marshalled chunk length.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
… to reduce amount of iteration we have to do through the chunk to find the right place.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
…memory.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
@tomwilkie
Copy link
Contributor Author

I've moved the chunk code into pkg/chunk/encoding now.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 with comments.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
@tomwilkie tomwilkie merged commit e70d65e into cortexproject:master Nov 19, 2018
@tomwilkie tomwilkie deleted the biggerchunk branch November 19, 2018 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bigger chunks Use Prom2 chunk code Grow chunks in memory

3 participants