fix topN filtering on multi-valued dimension bug#2255
fix topN filtering on multi-valued dimension bug#2255binlijin wants to merge 1 commit intoapache:masterfrom binlijin:fix_topN_filter_multi_valued
Conversation
|
👍 |
|
If I'm reading this correctly, the error comes from the fact that the topN is returning ALL dimension values in a multi-value dimension instead of just the ones that match the filter, is that correct? If so, then it would make more sense to me to have the topN function as expected. From the contract on Can you comment a bit more on why the prior cardinality methods were insufficient? |
There was a problem hiding this comment.
this breaks contract of getValueCardinality(), this should return cardinality "after" filtering.
|
@binlijin i have to dig more into the original issue you faced but getValueCardinality() impls are doing the right thing as per the contract of those methods. |
|
@himanshug, how does this pass unit tests? I'm surprised no UT caught this |
|
@fjy FilteredDimensionSpec is new feature and is tested with groupBy queries in the unit tests. looks like TopN can't handle the new DimensionSpec with correct cardinality implementation for some reason. |
|
@drcrallen @himanshug |
|
@himanshug |
|
also, the reason that we want to work with as small cardinality as possible because topN algorithm will allocate some buffers according to cardinality. if we return higher cardinality then those buffer will be bigger than they need to be. For example, say you used List filtered spec and with only 1 item in the list, then with reduced cardinality those buffers will be of size 1 only. |
When i backport #2130 to our version and use topn with it, it throw ArrayIndexOutOfBoundsException.