Skip to content

Add filter selectivity estimation for auto search strategy#3848

Merged
gianm merged 11 commits intoapache:masterfrom
jihoonson:filter-selectivity-estimation
Feb 6, 2017
Merged

Add filter selectivity estimation for auto search strategy#3848
gianm merged 11 commits intoapache:masterfrom
jihoonson:filter-selectivity-estimation

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson commented Jan 13, 2017

Currently, AutoStrategy of search query uses a cost-based planner which needs to compute the exact selectivity of the filter specified in the query. According to the result of JMC profiling, I figured out the bottleneck is calling the expensive union() operation of bitmaps. So, I propose an alternative way to estimate filter selectivity.

In this patch, I simply assumed dimension values are independent to calculate approximate selectivity. As a result, the calculated selectivity may not be exact.
However, I think it's worthwhile because the planning cost of AutoStrategy will be reduced, thereby that strategy can be used in more cases which will be helpful for users. In addition, we can improve the selectivity estimation algorithm if it is needed.

Here are some numbers.

master

concise bitmap

Benchmark                                  (limit)  (numSegments)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt       Score      Error  Units
SearchBenchmark.queryMultiQueryableIndex    750000              1            750000           basic.B  avgt   25  463864.923 ± 5935.667  us/op
SearchBenchmark.querySingleQueryableIndex   750000              1            750000           basic.B  avgt   25  471846.438 ± 5391.567  us/op

roaring bitmap

Benchmark                                  (limit)  (numSegments)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt       Score       Error  Units
SearchBenchmark.queryMultiQueryableIndex    750000              1            750000           basic.B  avgt   25  803411.999 ± 10526.744  us/op
SearchBenchmark.querySingleQueryableIndex   750000              1            750000           basic.B  avgt   25  807268.072 ± 10779.393  us/op

patch

concise bitmap

Benchmark                                  (limit)  (numSegments)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt       Score      Error  Units
SearchBenchmark.queryMultiQueryableIndex    750000              1            750000           basic.B  avgt   25  301427.773 ± 4794.298  us/op
SearchBenchmark.querySingleQueryableIndex   750000              1            750000           basic.B  avgt   25  285926.630 ± 2592.590  us/op

roaring bitmap

Benchmark                                  (limit)  (numSegments)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt       Score      Error  Units
SearchBenchmark.queryMultiQueryableIndex    750000              1            750000           basic.B  avgt   25  464101.372 ± 5595.434  us/op
SearchBenchmark.querySingleQueryableIndex   750000              1            750000           basic.B  avgt   25  466196.807 ± 3428.533  us/op

Also, you can see the overhead of selectivity estimation in the following graphs.
These graphs show the results of SearchBenchmark with the basic.C query.
- Concise bitmap
concise

- Roaring bitmap
roaring

I think the overhead of selectivity estimation is acceptable. If others agree, I'll change the default search strategy to auto.


This change is Reviewable

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting finding. Do you have numbers on how much time is taken by looking up bitmaps, vs. computing their size, vs. unioning them? I'm curious how much unioning cost exceeds the cost of just computing all the sizes.

If unioning is really dominant then your approach here sounds good. I'd just like to reduce duplicated code and watch out for behavior with multi-value dimensions.

boolean supportsBitmapIndex(BitmapIndexSelector selector);

/**
* Estimate selectivity of this filter. The estimated selectivity might be different from the exact value.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand on what is expected of the accuracy & speed of the "estimate", and what it might be used for? This will help Filter implementors know what they should be doing for this method.

*
* @param selector Object used to retrieve bitmap indexes
* @param totalNumRows total number of rows in a segment
* @return Selectivity ranging from 0 to 1.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"0 (filter selects no rows) to 1 (filter selects all rows)"

* Estimate selectivity of this filter. The estimated selectivity might be different from the exact value.
*
* @param selector Object used to retrieve bitmap indexes
* @param totalNumRows total number of rows in a segment
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totalNumRows shouldn't be necessary, it should be available through selector.getNumRows()

for (final String eachVal : values) {
matchedNumRows += selector.getBitmapIndex(dimension, eachVal).size();
}
return (double) matchedNumRows / totalNumRows;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be greater than 1 for multi-value dimensions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some of your other implementations will have a similar problem.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed multi-value dimensions. Thanks.

@Override
public double estimateSelectivity(BitmapIndexSelector selector, long totalNumRows)
{
if (extractionFn == null && likeMatcher.getSuffixMatch() == LikeDimFilter.LikeMatcher.SuffixMatch.MATCH_EMPTY) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of copy/pasted code here, is there a way to do better?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have getBitmapIterator take some higher level args and even handle the fallback vs. optimized case internally. In the fallback case it could return an Iterators.singletonIterator of the one bitmap. Then getBitmapIndex unions them and estimateSelectivity just adds up their sizes.

The helpers in Filters.java, as well as specific filters like BoundFilter and SpatialFilter, maybe others too, could use a similar strategy to reduce duplicated code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh.. I forgot refactoring these copy/pasted codes. Sorry. I'll do soon.

@leventov leventov self-assigned this Jan 19, 2017
@jihoonson
Copy link
Copy Markdown
Contributor Author

@gianm, I don't have exact numbers comparing the time taken by bitmap lookup, getting size, and bitmap union, but I think the above SearchQuery benchmark result shows the performance comparison of summing bitmap sizes and bitmap union. When just summing bitmap sizes instead of unioning them, the performance improvement ratios were about 165% and 173% for concise and roaring bitmap types, respectively.

I think unioning bitmaps is the bottleneck quite obviously. Here is a JMC profile result of SearchBenchmark.

2017-01-20 8 46 25

As you can see, just unioning bitmaps contributes about 6% to the total execution time of a search query. MinMaxPriorityQueue.poll() takes most of time, and I/O (via mmap) contributes only 20% of the total cost of bitmap unions.

@jihoonson
Copy link
Copy Markdown
Contributor Author

@gianm, addressed comments. Thank you for your review.

basicQueries.put("B", queryBuilderB);
}

{ // basic.C
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make those blocks methods basicX(), and then call them from a single init block?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

numMatchedRows += bitmap.size();
}

// assume multi-value column if columnCapabilities is null
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a comment before line 280

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.


// assume multi-value column if columnCapabilities is null
if (isMultiValueDimension) {
final double estimated = numMatchedRows * Filters.computeNonOverlapRatioFromBitmapSamples(bitmapList)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filters. not needed, we are in the same class

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

final int sampleNum = Math.min(SAMPLE_NUM_FOR_SELECTIVITY_ESTIMATION, bitmaps.size());
double nonOverlapRatioSum = 0.;
for (int i = 0; i < sampleNum; i++) {
final ImmutableBitmap b1 = bitmaps.get(random.nextInt(bitmaps.size()));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use ThreadLocalRandom.current() in this loop

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

public ImmutableBitmap getBitmapIndex(BitmapIndexSelector selector)
{
if (extractionFn == null && likeMatcher.getSuffixMatch() == LikeDimFilter.LikeMatcher.SuffixMatch.MATCH_EMPTY) {
if (directPrefixMatchable()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"directPrefixMatchable" is unclear to me. Maybe "dimensionEqualsPrefix"? Then the comments on lines 57 and 88 could be removed

@Override
public Iterator<ImmutableBitmap> iterator()
{
return new Iterator<ImmutableBitmap>()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please apply here the same improvement? https://github.com/druid-io/druid/pull/3754/files

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
}

private static double computeNonOverlapRatioFromBitmapSamples(List<ImmutableBitmap> bitmaps)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this method. Could you please add an explanation? Ideally add some tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added java doc.

)
{
long numMatchedRows = 0;
final List<ImmutableBitmap> bitmapList = Lists.newArrayList(bitmaps);
Copy link
Copy Markdown
Member

@leventov leventov Jan 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid materializing all bitmaps in memory at the same time. See #3642 (comment). Maybe you need to pass around IntList + a lazy int -> ImmutableBitmap function, and use it in computeNonOverlapRatioFromBitmapSamples().

cc @gianm

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

@jihoonson
Copy link
Copy Markdown
Contributor Author

@leventov, thanks for your review. I addressed comments.

{
return Filters.bitmapsFromIndexes(getBitmapIndexIterator(boundDimFilter, bitmapIndex), bitmapIndex);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use specialized IntIterator. E. g. we have 100s of thousands of values in some dimensions, don't want to do boxing on this scale.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Thanks.

public Iterator<ImmutableBitmap> iterator()
public Iterator<Integer> iterator()
{
return new Iterator<ImmutableBitmap>()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaceable with IntIterators.fromTo()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

static double estimatePredicateSelectivity(
private static double estimateSelectivityOfBitmapList(
BitmapIndex bitmapIndex,
Iterable<Integer> bitmapIndexeIterable,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterable<ImmutableBitmap> was used only to facilitate laziness. List of integers shouldn't be lazy, so IntList could be used, and if so, ImmutableList.copyOf() is not needed in the body of this method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to IntList.

long numMatchedRows = 0;
for (Integer index : bitmapIndexes) {
final ImmutableBitmap bitmap = bitmapIndex.getBitmap(index);
numMatchedRows += bitmap.size();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately Concise format doesn't store the size, so size() is not cheap. Roaring storing sizes of the blocks, but computing the total size of Roaring bitmap also requires some iteration and touching a lot of memory.

This problem also complicates implementation of #3878.

I think we should store bitmap sizes on segment format level and add ways to read bitmap size without creating bitmap itself.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, size() is not cheap. So, you mean frequent call of size() is expensive? I'll caching it.

if (isMultiValueDimension) {
final double estimated = numMatchedRows * Filters.computeNonOverlapRatioFromBitmapSamples(bitmapList)
/ totalNumRows;
final double estimated = numMatchedRows * computeNonOverlapRatioFromFirstNBitmapSamples(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that the math model of this expression and computeNonOverlapRatioFromRandomBitmapSamples() is wrong. Imagine there are 10 bitmaps each of 0.1 selectivity, and they all overlap 100%. computeNonOverlapRatioFromRandomBitmapSamples() will return 0.5, estimated will be 10 * 0.1 * 0.5 = 0.5, while it should be 0.1.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I'm such an idiot. It's my bad. I'll change this method to calculate the average non-overlap ratio of the right hand side bitmap of the union operation like below.
(The below is just an idea, and not optimized)

while (...) {
  overlapSize = size(unionedBitmap) + size(rightBitmap) - size(union(unionedBitmap, rightBitmap))
  unionedBitmap = union(unionedBitmap, rightBitmap)
  nonOverlapRatioOfRightBitmap = (size(rightBitmap) - overlapSize) / rightBitmap.size()
  nonOverlapRatioSum += nonOverlapRatioOfRightBitmap
}
return nonOverlapRatioSum / sampleNum;

And, estimateSelectivityOfBitmapList() calling this method will be changed like below.

double numMatchedRows = bitmapIndex.getBitmap(bitmapIndexes.get(0)).size();
for (int i = 1; i < bitmapIndexes.size(); i++) {
  final ImmutableBitmap bitmap = bitmapIndex.getBitmap(bitmapIndexes.get(i));
  numMatchedRows += bitmap.size() * nonOverlapRatio;
}
return numMatchedRows / totalNumRows

I'll do add some tests.

final ImmutableBitmap b1 = bitmaps.get(random.nextInt(bitmaps.size()));
final ImmutableBitmap b2 = bitmaps.get(random.nextInt(bitmaps.size()));
final ImmutableBitmap b1 = bitmapIndex.getBitmap(
bitmapIndexes.get(ThreadLocalRandom.current().nextInt(bitmapIndexes.size())));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to extract ThreadLocalRandom.current() as a local variable outside the loop

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If expression is wrapped after brace in Druid's style guide the pairing expression should be on another line, i. e. ); should be on a separate line

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm still not familiar with Druid's style, and making mistakes a lot. I'll be more careful.

final ImmutableBitmap b1 = bitmapIndex.getBitmap(
bitmapIndexes.get(ThreadLocalRandom.current().nextInt(bitmapIndexes.size())));
final ImmutableBitmap b2 = bitmapIndex.getBitmap(
bitmapIndexes.get(ThreadLocalRandom.current().nextInt(bitmapIndexes.size())));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@@ -328,43 +457,84 @@ private static Iterable<ImmutableBitmap> makePredicateQualifyingBitmapIterable(
public Iterator<ImmutableBitmap> iterator()
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaceable with Iterators.transform()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


@Override
public double estimateSelectivity(ColumnSelector columnSelector, BitmapIndexSelector indexSelector)
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption could be "checked" approximately the same way as in computeNonOverlapRatioFromFirstNBitmapSamples()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you tell me more detailed idea?

Copy link
Copy Markdown
Contributor Author

@jihoonson jihoonson Jan 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah,I got your point. But this is a very basic assumption for estimating selectivity in databases and there are many more improvements (https://scholar.google.co.kr/scholar?ion=1&espv=2&um=1&ie=UTF-8&lr&q=related:A7vzlp4zVKU7uM:scholar.google.com/). I think we need to improve selectivity estimation later when druid is ready for providing more statistics like histogram, but it looks good for now. What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that for somebody this assumption could be wrong, and some selectivity computed incorrectly, and the overall select query chooses suboptimal strategy and performs worse.

Ultimately in this PR, I want to ensure that there won't be performance regression

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this assumption might be wrong, maybe in many real applications. However, in that cases, users can simply use other search strategies like UseIndexStrategy or CursorOnlyStrategy. This is why we still use UseIndexStrategy as the default strategy.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ensure there won't be regression if users don't touch any configurations, just update druid binary to the version which includes this patch. "No regression if user has to manually change some config" is a regression, IMO

Copy link
Copy Markdown
Contributor

@gianm gianm Jan 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leventov I believe that is the case already: this code only gets activated if the search strategy is "auto", which is not the default strategy. It also doesn't exist in the current release, so regression is impossible.

- Fix wrong non-overlap ratio computation and added unit tests.
- Change Iterable<Integer> to IntIterable
- Remove unnecessary Iterable<Integer>
@jihoonson
Copy link
Copy Markdown
Contributor Author

@leventov thanks for your detailed review. I addressed comments.

@jihoonson
Copy link
Copy Markdown
Contributor Author

The Travis failure is due to Kafka connection timeout and looks not related to this issue.

final ImmutableBitmap bitmap = bitmapIndex.getBitmap(index);
numMatchedRows += bitmap.size();
double numMatchedRows = bitmapIndexes.size() > 0 ? bitmapIndex.getBitmap(bitmapIndexes.get(0)).size() : 0;
final double nonOverlapRatio = isMultiValueDimension && bitmapIndexes.size() > 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please split this into if-else blocks? Chains of ternary operators in one expression are unreadable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

* @see #estimatePredicateSelectivity(ColumnSelector, String, BitmapIndexSelector, Predicate)
*/
private static double computeNonOverlapRatioFromRandomBitmapSamples(
static double computeNonOverlapRatioFromRandomBitmapSamples(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add @VisibleForTesting in such cases

Copy link
Copy Markdown
Member

@leventov leventov Jan 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this model as well. It computes mean of values, which are position-dependent (discount in time). Consider you have two bitmaps, one covers 10% of bits, another 100%. If you select them in one order you will get 0.0 from this method, in another order - 0.9.

Another problem with this estimation, is that is computes pairwise unions, i. e. generates a lot of garbage.

I suggest to not try compute selectivity with multi-value dimensions in this PR.

Later, I propose the following approach to be implemented:

  • Compute a sample union of N randomly selected bitmaps (not one-by-one, union of all of them together)
  • Based on 1) size of the sample union 2) sizes of all unified bitmaps 3) total number of rows 4) number and sizes of bitmaps not included in the sample union, compute the expected size of union of all bitmaps. A non-trivial math model is required to do this, and I don't expect it could be done without help or at least review by somebody who remembers statistics course from high school really well.
  • Reuse the computed sample union later in Filter/UseIndexesStrategy, when the union of all bitmaps is needed (and if needed).

Also for efficiency of the union size estimation, some parameters could be pre-computed when segment is created and stored in the segment format. For example, the sum of sizes of all bitmaps (which is bigger than the number of rows, for multi-value dimensions). This is related to #3882.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First in your example, non-overlap ratios will be 0.0 and 0.9. They cannot be negatives because bSize - overlapSize = unionedSize - preUnionedSize >= 0 always holds which means bSize >= overlapSize.

I agree on that there will be more efficient and good models for this. There have been a lot of studies on this issue, and we need to investigate them and choose one. However, I'm not sure about completely disabling this feature for multi-value dimensions. Users can just use other strategies until we improve this sufficiently. Isn't it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, it would be ok to disable this feature because it works well under very strict conditions. I'm open to this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for disabling this feature now not because the model yields very imprecise results (it's ok as long as there is no regression compared to the state prior to this PR), but because it generates a lot of garbage. The default bitmap algorithm is concise. Pairwise union of 100 bitmaps creates 100 distinct bitmaps, which immediately become garbage. Union of 100 bitmaps at once creates only one resulting bitmap.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. I disabled it.

final BitmapIndex bitmapIndex
)
{
return IntIteratorUtils.toIntList(getBitmapIndexIterator(boundDimFilter, bitmapIndex).iterator());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like RangeIntList(from, to) could be implemented and used here:

class RangeIntList extends AbstractIntList
  {
    final int start, end;
    final int size

    RangeIntList(int start, int end)
    {
      this.start = start;
      this.end = end;
      this.size = end - start;
    }
    @Override
    public int getInt(int index)
    {
      Preconditions.checkElementIndex(index, size);
      return start + index;
    }

    @Override
    public int size()
    {
      return size;
    }
  }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I added a new util class IntListUtils for this class, but am not sure about its location..

- Split a long ternary operation into if-else blocks
- Add IntListUtils.fromTo()
@jihoonson
Copy link
Copy Markdown
Contributor Author

I'm investigating test failures.

@jihoonson
Copy link
Copy Markdown
Contributor Author

jihoonson commented Jan 27, 2017

Fixed test failures.


public static IntList fromTo(int from, int to)
{
return new RangeIntList(from, to);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a check that from <= to

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also would like to do, but BoundFilter.getStartEndIndexes() sometimes returns a smaller to than from and it makes some tests failed. I added a comment.


public static IntList fromTo(int from, int to)
{
// TODO: check `from` is always smaller than or equal to `to`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a good idea to skip the check because of a single use case. If to could be less than from on BoundFilter side, we can add a check on that side and return IntLists.EMPTY_LIST.

Also, in Druid we try to avoid adding TODOs, add issues instead, even for relatively small things. And too small things should probably be fixed in-place. See https://github.com/metamx/java-util/pull/50/files#r77869324

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I added the check and fixed test failure.

@jihoonson
Copy link
Copy Markdown
Contributor Author

@leventov, thank you for your detailed review.
@gianm, any comments?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Feb 3, 2017

Taking another look now, @jihoonson

}
}

endIndex = startIndex > endIndex ? startIndex : endIndex;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Math.max(startIndex, endIndex)

@leventov leventov removed their assignment Feb 6, 2017
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Feb 6, 2017

LGTM

@gianm gianm merged commit ddd8c9e into apache:master Feb 6, 2017
@gianm gianm added this to the 0.10.0 milestone Feb 6, 2017
@gianm gianm added the Feature label Feb 6, 2017
dimFilters.add(new SelectorDimFilter(dimName, "3", null));
dimFilters.add(new BoundDimFilter(dimName, "100", "10000", true, true, true, null, null));
dimFilters.add(new InDimFilter(dimName, dimUniformFilterVals, null));
dimFilters.add(new InDimFilter(dimName, dimUniformFilterVals, null));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jihoonson did you intentionally add three identical lines in a row?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a mistake. Thanks for finding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants