fix IncrementalIndex performance regression#12048
fix IncrementalIndex performance regression#12048clintropolis merged 2 commits intoapache:masterfrom
Conversation
| if (timeAndMetricsColumnCapabilities.containsKey(columnName)) { | ||
| return timeAndMetricsColumnCapabilities.get(columnName); | ||
| } | ||
| if (dimensionDescs.containsKey(columnName)) { |
There was a problem hiding this comment.
Looking through the code in IncrementalIndex, sometimes dimensionDescs is accessed with synchronization. Sometimes it is not. Assuming that it's okay to access it without synchronization, this looks good. Have we verified that we will already be in a synchronized block once we get to this method?
There was a problem hiding this comment.
Good question, will have a look. nevermind, it doesn't change after constructortimeAndMetricsColumnCapabilities would also have this problem potentially?
There was a problem hiding this comment.
btw, I didn't trust any of the users of dimensionDescs that weren't obviously safe so adjusted a couple of other locations as well before I did the measurement in my previous comment


Description
This PR fixes a performance regression caused by #11853, where adding type information to the
RowBasedColumnSelectorFactoryofIncrementalIndexwas done by using thegetColumnCapabilitiesmethod the latter had, which built a hashmap of all the capabilities and then translated that into aRowSignature. This turned out to be very dramatically expensive.To fix this,
IncrementalIndexnow just implementsColumnInspectorso that it can cut out the middle-man and serve as theColumnInspectorfor theRowBasedColumnSelectorFactory.Before:

After:

Zoomed into sink.add:

before:
after:

Task run times are shorter too:


before:
after:
This PR has: