Optimized bstream reader used by XORChunk iterator#7390
Optimized bstream reader used by XORChunk iterator#7390bwplotka merged 7 commits intoprometheus:masterfrom
Conversation
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
bwplotka
left a comment
There was a problem hiding this comment.
I love it, I think I kind of understand what is happening, but quite impressed you found the way to squeze performance out of this without changing compression 💪
I think it should also work once we improve compression (should work not only with XOR) so pretty amazing!
I still think improving compression can give large benefits, but it's amazing we improved the parse part as well. cc @simonpasquier who works on improving chunk encoding, can you take a look as well on this? 🤗
It's LGTM, just style comments.
| if b.streamOffset+8 <= len(b.stream) { | ||
| // This is ugly, but significantly faster. | ||
| b.buffer = | ||
| ((uint64(b.stream[b.streamOffset])) << 56) | |
There was a problem hiding this comment.
Benchmarking it. Few % iteration time reduction comes from this.
There was a problem hiding this comment.
We could use binary.BigEndian.Uint64 :)
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
|
Thanks 💪 |
codesome
left a comment
There was a problem hiding this comment.
Post merge LGTM! A few comments btw.
| if b.streamOffset+8 <= len(b.stream) { | ||
| // This is ugly, but significantly faster. | ||
| b.buffer = | ||
| ((uint64(b.stream[b.streamOffset])) << 56) | |
There was a problem hiding this comment.
We could use binary.BigEndian.Uint64 :)
| @@ -0,0 +1,89 @@ | |||
| // Copyright 2017 The Prometheus Authors | |||
| v, err := r.readBitFast() | ||
| if err != nil { | ||
| v, err = r.readBit() | ||
| } |
There was a problem hiding this comment.
Should we be explicit about whether we expect an error from readBitFast or not? (even in other places)
|
Discussed offline: this additional license in |
#7413) * Fixed bstream test license Signed-off-by: Marco Pracucci <marco@pracucci.com> * Simplified bstreamReader.loadNextBuffer() Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed date in license Signed-off-by: Marco Pracucci <marco@pracucci.com>
I spent the last couple of days investigating query performances and I've noticed that, for several queries we see in production, a significant amount of time is spent iterating series and sample. In this analysis, I've noticed most of the time spent iterating samples is in the
XORChunkiterator and further benchmarks showed that thebstreamreader introduces a significant bottleneck.I've tried different approaches to optimise the
bstreamreader and in this PR I'm proposing the optimisation I've found having the least downsides. The optimisation is split into two parts:uint64buffer in order to optimise bit-wise operations (assuming most production architectures are 64 bit).Fast()version of read bit/bits which get inlined by the compiler. These functions must be leafs and super small, reason why may return an error in case the current buffer doesn't contain enough bits to satify the request. In case aFast()function returns error, it must be retried using the normal (but slower) one.The following show the result of a couple of benchmarks. I've measured similar query reduction times running benchmarks in Cortex too.
BenchmarkXORIterator
BenchmarkRangeQuery