Skip to content

Consider using Roaring batch iterators#6770

Closed
richardstartin wants to merge 3 commits intoapache:masterfrom
richardstartin:faster-roaring-iterators
Closed

Consider using Roaring batch iterators#6770
richardstartin wants to merge 3 commits intoapache:masterfrom
richardstartin:faster-roaring-iterators

Conversation

@richardstartin
Copy link
Copy Markdown
Member

Here is a patch which replaces the IntIterator implementation with one that wraps a BatchIterator - an iterator which extracts values from the bitmap into a buffer in batches, and can be a lot faster. This relies on a space time tradeoff, so probably requires heuristics or a configurable cardinality threshold to choose when the new kind of iterator is used. These iterators are also more expensive to clone, which seems to be done extensively in druid.

At b5df73e (with batch iterators) I get

Benchmark                      (bitmapAlgo)  (prob)   (size)  Mode  Cnt        Score        Error  Units
BitmapIterationBenchmark.iter        bitset     0.0  1000000  avgt    5        2.785 ±      0.122  ns/op
BitmapIterationBenchmark.iter        bitset   0.001  1000000  avgt    5     8496.144 ±    887.048  ns/op
BitmapIterationBenchmark.iter        bitset     0.1  1000000  avgt    5   481792.250 ±  10840.026  ns/op
BitmapIterationBenchmark.iter        bitset     0.5  1000000  avgt    5  1910323.042 ±  49924.335  ns/op
BitmapIterationBenchmark.iter        bitset    0.99  1000000  avgt    5  3680187.725 ±  61100.955  ns/op
BitmapIterationBenchmark.iter        bitset     1.0  1000000  avgt    5  3732561.471 ±  54140.556  ns/op
BitmapIterationBenchmark.iter       concise     0.0  1000000  avgt    5        2.781 ±      0.060  ns/op
BitmapIterationBenchmark.iter       concise   0.001  1000000  avgt    5     5615.190 ±     75.432  ns/op
BitmapIterationBenchmark.iter       concise     0.1  1000000  avgt    5  2361254.431 ± 294409.483  ns/op
BitmapIterationBenchmark.iter       concise     0.5  1000000  avgt    5  5561685.586 ±  95190.501  ns/op
BitmapIterationBenchmark.iter       concise    0.99  1000000  avgt    5  2097075.870 ±  63050.075  ns/op
BitmapIterationBenchmark.iter       concise     1.0  1000000  avgt    5  2288373.696 ±  90706.616  ns/op
BitmapIterationBenchmark.iter       roaring     0.0  1000000  avgt    5       78.431 ±      1.114  ns/op
BitmapIterationBenchmark.iter       roaring   0.001  1000000  avgt    5     3355.265 ±     50.972  ns/op
BitmapIterationBenchmark.iter       roaring     0.1  1000000  avgt    5   369259.712 ±   9960.458  ns/op
BitmapIterationBenchmark.iter       roaring     0.5  1000000  avgt    5  1098874.194 ± 702851.369  ns/op
BitmapIterationBenchmark.iter       roaring    0.99  1000000  avgt    5  2298007.861 ±  33151.084  ns/op
BitmapIterationBenchmark.iter       roaring     1.0  1000000  avgt    5  2291253.405 ±  31165.949  ns/op

At 7a09cde (without batch iterators) I get:

Benchmark                      (bitmapAlgo)  (prob)   (size)  Mode  Cnt        Score        Error  Units
BitmapIterationBenchmark.iter        bitset     0.0  1000000  avgt    5        2.804 ±      0.054  ns/op
BitmapIterationBenchmark.iter        bitset   0.001  1000000  avgt    5     8584.675 ±    149.012  ns/op
BitmapIterationBenchmark.iter        bitset     0.1  1000000  avgt    5   477834.206 ±  16173.440  ns/op
BitmapIterationBenchmark.iter        bitset     0.5  1000000  avgt    5  1916687.564 ±  66723.789  ns/op
BitmapIterationBenchmark.iter        bitset    0.99  1000000  avgt    5  3661377.107 ± 112077.640  ns/op
BitmapIterationBenchmark.iter        bitset     1.0  1000000  avgt    5  3997551.794 ±  88884.369  ns/op
BitmapIterationBenchmark.iter       concise     0.0  1000000  avgt    5        3.074 ±      0.818  ns/op
BitmapIterationBenchmark.iter       concise   0.001  1000000  avgt    5     6249.486 ±    521.458  ns/op
BitmapIterationBenchmark.iter       concise     0.1  1000000  avgt    5  2331668.373 ±  62614.776  ns/op
BitmapIterationBenchmark.iter       concise     0.5  1000000  avgt    5  5621965.934 ± 105912.792  ns/op
BitmapIterationBenchmark.iter       concise    0.99  1000000  avgt    5  2078971.999 ±  61276.180  ns/op
BitmapIterationBenchmark.iter       concise     1.0  1000000  avgt    5  2310745.781 ± 108732.741  ns/op
BitmapIterationBenchmark.iter       roaring     0.0  1000000  avgt    5        5.654 ±      0.318  ns/op
BitmapIterationBenchmark.iter       roaring   0.001  1000000  avgt    5     4211.167 ±    105.185  ns/op
BitmapIterationBenchmark.iter       roaring     0.1  1000000  avgt    5   431989.663 ±  18202.174  ns/op
BitmapIterationBenchmark.iter       roaring     0.5  1000000  avgt    5  1403242.519 ±  29345.563  ns/op
BitmapIterationBenchmark.iter       roaring    0.99  1000000  avgt    5  2569253.542 ±  29920.673  ns/op
BitmapIterationBenchmark.iter       roaring     1.0  1000000  avgt    5  2639932.161 ±  61587.416  ns/op

Note that using batch iterators via the IntIterator interface costs at least 2x throughput compared to using them via the BatchIterator interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant