Skip to content

Two-stage filtering#3018

Merged
nishantmonu51 merged 2 commits intoapache:masterfrom
jon-wei:twostage_filter
Jun 22, 2016
Merged

Two-stage filtering#3018
nishantmonu51 merged 2 commits intoapache:masterfrom
jon-wei:twostage_filter

Conversation

@jon-wei
Copy link
Copy Markdown
Contributor

@jon-wei jon-wei commented May 25, 2016

Implements the proposal in #2742

This PR adds support for filtering without bitmap indexes, by adding a new filtering step during Cursor iteration in the QueryableIndexStorageAdapter.

The linked proposal contains details about the approach taken, with the exception that filter match result caching is not implemented (can be a later PR).


Benchmarks, using the benchmarks in #2875

Master:

GroupByBenchmark.queryMultiQueryableIndex                  N/A              4                N/A            100000       N/A           basic.A          N/A  avgt   50   420317.283 ±  9960.026  us/op
GroupByBenchmark.querySingleIncrementalIndex               N/A              4                N/A            100000       N/A           basic.A          N/A  avgt   50   237684.936 ±  4982.999  us/op
GroupByBenchmark.querySingleQueryableIndex                 N/A              4                N/A            100000       N/A           basic.A          N/A  avgt   50   192516.042 ±  1247.105  us/op
SearchBenchmark.queryMultiQueryableIndex                  1000              1                N/A            750000       N/A           basic.A          N/A  avgt   50    18569.254 ±   130.116  us/op
SearchBenchmark.querySingleIncrementalIndex               1000              1                N/A            750000       N/A           basic.A          N/A  avgt   50   252040.937 ±  5094.427  us/op
SearchBenchmark.querySingleQueryableIndex                 1000              1                N/A            750000       N/A           basic.A          N/A  avgt   50    18522.747 ±    96.640  us/op
SelectBenchmark.queryIncrementalIndex                      N/A              1               1000             25000       N/A           basic.A          N/A  avgt   50    59077.585 ±   775.733  us/op
SelectBenchmark.queryMultiQueryableIndex                   N/A              1               1000             25000       N/A           basic.A          N/A  avgt   50    84701.323 ±   591.074  us/op
SelectBenchmark.queryQueryableIndex                        N/A              1               1000             25000       N/A           basic.A          N/A  avgt   50    86754.784 ±   536.997  us/op
TimeseriesBenchmark.queryFilteredSingleQueryableIndex      N/A              1                N/A            750000       N/A           basic.A          N/A  avgt   50    15513.263 ±    34.824  us/op
TimeseriesBenchmark.queryMultiQueryableIndex               N/A              1                N/A            750000       N/A           basic.A          N/A  avgt   50   141221.807 ±   922.429  us/op
TimeseriesBenchmark.querySingleIncrementalIndex            N/A              1                N/A            750000       N/A           basic.A          N/A  avgt   50  1491135.435 ± 16168.599  us/op
TimeseriesBenchmark.querySingleQueryableIndex              N/A              1                N/A            750000       N/A           basic.A          N/A  avgt   50   142651.506 ±   734.796  us/op
TopNBenchmark.queryMultiQueryableIndex                     N/A              1                N/A            750000       N/A           basic.A           10  avgt   50   183568.518 ±   929.395  us/op
TopNBenchmark.querySingleIncrementalIndex                  N/A              1                N/A            750000       N/A           basic.A           10  avgt   50  1459638.382 ± 27925.835  us/op
TopNBenchmark.querySingleQueryableIndex                    N/A              1                N/A            750000       N/A           basic.A           10  avgt   50   177885.035 ±  2059.651  us/op

Patch:

GroupByBenchmark.queryMultiQueryableIndex                  N/A              4                N/A            100000           basic.A          N/A  avgt   50   425610.302 ± 12920.928  us/op
GroupByBenchmark.querySingleIncrementalIndex               N/A              4                N/A            100000           basic.A          N/A  avgt   50   233186.139 ±  1117.466  us/op
GroupByBenchmark.querySingleQueryableIndex                 N/A              4                N/A            100000           basic.A          N/A  avgt   50   185192.077 ±  1269.109  us/op
SearchBenchmark.queryMultiQueryableIndex                  1000              1                N/A            750000           basic.A          N/A  avgt   50    18409.140 ±    97.739  us/op
SearchBenchmark.querySingleIncrementalIndex               1000              1                N/A            750000           basic.A          N/A  avgt   50   259170.630 ±  5773.437  us/op
SearchBenchmark.querySingleQueryableIndex                 1000              1                N/A            750000           basic.A          N/A  avgt   50    18474.211 ±   143.386  us/op
SelectBenchmark.queryIncrementalIndex                      N/A              1               1000             25000           basic.A          N/A  avgt   50    60913.878 ±   569.650  us/op
SelectBenchmark.queryMultiQueryableIndex                   N/A              1               1000             25000           basic.A          N/A  avgt   50    85294.587 ±   451.770  us/op
SelectBenchmark.queryQueryableIndex                        N/A              1               1000             25000           basic.A          N/A  avgt   50    83608.035 ±   558.624  us/op
TimeseriesBenchmark.queryFilteredSingleQueryableIndex      N/A              1                N/A            750000           basic.A          N/A  avgt   50    15723.450 ±    54.988  us/op
TimeseriesBenchmark.queryMultiQueryableIndex               N/A              1                N/A            750000           basic.A          N/A  avgt   50   139903.759 ±   783.649  us/op
TimeseriesBenchmark.querySingleIncrementalIndex            N/A              1                N/A            750000           basic.A          N/A  avgt   50  1477342.648 ± 13718.615  us/op
TimeseriesBenchmark.querySingleQueryableIndex              N/A              1                N/A            750000           basic.A          N/A  avgt   50   138654.076 ±   640.646  us/op
TopNBenchmark.queryMultiQueryableIndex                     N/A              1                N/A            750000           basic.A           10  avgt   50   184123.006 ±   787.114  us/op
TopNBenchmark.querySingleIncrementalIndex                  N/A              1                N/A            750000           basic.A           10  avgt   50  1465099.858 ± 26925.094  us/op
TopNBenchmark.querySingleQueryableIndex                    N/A              1                N/A            750000           basic.A           10  avgt   50   176951.700 ±  1448.476  us/op

current results for reading 1,500,000 rows from a long column, no filters:

master
FilterPartitionBenchmark.longRead           1500000     basic  avgt   50  44587.905 ± 116.094  us/op

patch
FilterPartitionBenchmark.longRead           1500000     basic  avgt   50  44774.780 ± 117.027  us/op

Benchmark that applies a single selector filter to a 750,000 row segment:
Master

FilterPartitionBenchmark.readWithPreFilter                 N/A            N/A                N/A            750000     basic               N/A          N/A  avgt   50     5713.381 ±     6.405  us/op

Patch

FilterPartitionBenchmark.readWithPreFilter                 N/A            N/A                N/A            750000     basic               N/A          N/A  avgt   50     5998.602 ±   567.914  us/op

Benchmark only applicable to this patch:

Benchmark                                              (limit)  (numSegments)  (pagingThreshold)  (rowsPerSegment)  (schema)  (schemaAndQuery)  (threshold)  Mode  Cnt        Score       Error  Units
FilterPartitionBenchmark.longRead                          750000     basic  avgt   50   24258.797 ±   86.279  us/op
FilterPartitionBenchmark.readComplexOrFilter               750000     basic  avgt   50  209979.618 ± 7549.892  us/op
FilterPartitionBenchmark.readComplexOrFilterCNF            750000     basic  avgt   50  412001.016 ± 9765.903  us/op
FilterPartitionBenchmark.readOrFilter                      750000     basic  avgt   50   67674.482 ± 1301.258  us/op
FilterPartitionBenchmark.readOrFilterCNF                   750000     basic  avgt   50   76440.205 ± 1675.725  us/op
FilterPartitionBenchmark.readWithExFnPostFilter            750000     basic  avgt   50  135087.429 ±  795.120  us/op
FilterPartitionBenchmark.readWithExFnPreFilter             750000     basic  avgt   50    6146.368 ±  178.410  us/op
FilterPartitionBenchmark.readWithPostFilter                750000     basic  avgt   50   19275.322 ±  614.610  us/op
FilterPartitionBenchmark.readWithPreFilter                 750000     basic  avgt   50    5943.020 ±  492.261  us/op
FilterPartitionBenchmark.stringRead                        750000     basic  avgt   50   87951.983 ± 2488.246  us/op

@jon-wei jon-wei added this to the 0.9.2 milestone May 25, 2016
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems pretty expensive if the bitmap isn't roaring

are we planning on making roaring the default?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not planning on changing the default bitmap to roaring with this PR at least.

if anyone knows of a faster way to get a reverse bitmap iterator, please let me know

Copy link
Copy Markdown
Contributor

@gianm gianm Jun 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a better way for concise.

Also this code is just moved from lower down in the same file, it was added as part of the descending select patch. It should only get used for descending queries.

@navis navis mentioned this pull request May 26, 2016
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added final

@jon-wei jon-wei force-pushed the twostage_filter branch from 0792211 to fb2c564 Compare May 31, 2016 21:25
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not from Hive?

@jon-wei jon-wei force-pushed the twostage_filter branch from fa1dc26 to 655aa7a Compare June 1, 2016 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants