Fix dimension selectors with extractionFns on missing columns.#4717
Fix dimension selectors with extractionFns on missing columns.#4717leventov merged 7 commits intoapache:masterfrom
Conversation
This patch properly applies the requested extractionFn to missing columns. It's important when the extractionFn maps null to something other than null.
| } | ||
|
|
||
| if (selector.nameLookupPossibleInAdvance() | ||
| && selector.getValueCardinality() == 1 |
There was a problem hiding this comment.
Please extract this into an utility function "isNullSelector"
| import java.util.Objects; | ||
|
|
||
| public class NullDimensionSelector implements SingleValueHistoricalDimensionSelector, IdLookup | ||
| public class ConstantDimensionSelector implements SingleValueHistoricalDimensionSelector, IdLookup |
There was a problem hiding this comment.
I suggest to keep NullDimensionSelector. I think it's not a big deal, however it could make a difference for performance somewhere (reading a field vs. constant null)
There was a problem hiding this comment.
I thought about it, but then thought it wasn't worth the additional code since the only potentially hot methods affected are lookupId and lookupName, both of which are already incredibly cheap in ConstantDimensionSelector relative to a "real" dimension selector. And I didn't think it was worth having an entire extra file just for a probably very slight optimization for columns that don't exist.
I can add it if you still think it's worth it, but that was my reasoning.
There was a problem hiding this comment.
FYI #4623 allows to reduce code duplication in ColumnSelectorFactory, GenericColumn, etc. implementations, by leaving just makeColumnValueSelector() instead of makeLong/Float/Double/Object/ColumnSelector(). This will likely require to introduce NilColumnValueSelector, returning 0 from getLong/Float/Double, null from get(), Void from classOfObject(), and also implementing DimensionSelector exactly as currently NullDimensionSelector does.
There was a problem hiding this comment.
Good to know, but what do you think the implications are for this patch?
There was a problem hiding this comment.
That NullDimensionSelector will be revived and removing it now will only break git history for class.
But leaving this to you and this ask is not blocking for this PR.
There was a problem hiding this comment.
In that case, I will add it back in this PR, since it sounds like it will come back eventually anyway.
| } else { | ||
| final DimensionSelector selector = virtualColumn.makeDimensionSelector(dimensionSpec, factory); | ||
| return selector == null ? dimensionSpec.decorate(NullDimensionSelector.instance()) : selector; | ||
| if (selector == null) { |
There was a problem hiding this comment.
Shouldn't this logic be inside virtualColumn.makeDimensionSelector()? So that dimensionSpec.decorate() is applied just at one site?
There was a problem hiding this comment.
It's only applied if virtualColumn.makeDimensionSelector returns null. The contract of the method is that virtualColumn.makeDimensionSelector can return actual null if a selector can't be made for some reason (maybe the selector type isn't supported), and in this case we must decorate a null selector.
There was a problem hiding this comment.
I want dimensionSpec.decorate() to be consistently applied at one place. For example, virtualColumn.makeDimensionSelector() accepted ExtractionFn, and then dimensionSpec.decorate() applied here to either result of virtualColumn.makeDimensionSelector() call or ConstantDimensionSelector.of(null, dimensionSpec.getExtractionFn()).
There was a problem hiding this comment.
The idea behind the contract of virtualColumn.makeDimensionSelector is that it's responsible for applying decoration, and that may give it the power to potentially optimize something in some way (maybe it's aware of certain kinds of DimensionSpecs). Currently, no implementation in core actually uses this power, so it could be taken away if we want and only applied "outside". But the idea was that maybe someone would find a creative use for it.
What do you think?
There was a problem hiding this comment.
Ok, then returning to the original suggestion, why couldn't virtualColumn.makeDimensionSelector() return NullDimensionSelector itself (never returns null)?
There was a problem hiding this comment.
It could, but then we will still need to apply decoration in two places, since VirtualColumns.makeDimensionSelector might still be called with a column name that doesn't exist, and it would have to build and decorate its own column.
However, now that I think about it, we could get around that by forbidding users from calling VirtualColumns.makeDimensionSelector with a column that doesn't exist. No current callers do it anyway. So I can make that change, and make VirtualColumn.makeDimensionSelector non-nullable, if that sounds good.
| } | ||
| } | ||
|
|
||
| private static boolean isNullSelector(final DimensionSelector selector) |
There was a problem hiding this comment.
In this case the parameter should be marked @Nullable, but I think we should avoid this, because null and "a selector object indistinguishable from NullDimensionSelector" are different things. So suggested for isNullSelector() to accept only non-null objects, and leave == null check outside.
There was a problem hiding this comment.
Are you suggesting changing the call site to,
if (selector != null && !isNullSelector(selector)) {
// Do stuff
}Instead of,
if (!isNullSelector(selector)) {
// Do stuff
}I can make the change, but don't see why one is better than the other in this case… could you please explain your thinking?
There was a problem hiding this comment.
A simple principle that applies here is that we should minimize nullability of method parameters and return types. It's a source of bugs: selector could be null for a good reason, and could be null by mistake (e. g. uninitialized variable). And even if selector is null for a good reason, accepting it makes the utility method less specific, because there could be variable potential meanings and treatments of selector as null.
Most of this doesn't apply right here, but if this method is used elsewhere and made public, that is very probable, it will apply.
There was a problem hiding this comment.
Right now this method has exactly one caller, and since it's private in SearchQueryRunner.java it's not going to have more than that unless it's extracted somewhere else. I guess you're saying that someone might want to extract this method to a common utility class in the future, and just in case that happens, we should make the arg non-nullable so the future person is not tempted to create a common utility method with a nullable arg… since in turn that would tempt callers to not handle nulls appropriately.
I guess I can see that point. It seems too theoretical to make it a hard rule in Druid, but I'll do it in this case since we have already spent so much time talking about it. Although in general I think it's fine to be looser with guidelines like this for private helper functions vs. public helper functions, and stricter at the point where someone actually does make it public, if ever.
There was a problem hiding this comment.
Pushed new helper function.
| } else { | ||
| final DimensionSelector selector = virtualColumn.makeDimensionSelector(dimensionSpec, factory); | ||
| return selector == null ? dimensionSpec.decorate(NullDimensionSelector.instance()) : selector; | ||
| if (selector == null) { |
There was a problem hiding this comment.
I want dimensionSpec.decorate() to be consistently applied at one place. For example, virtualColumn.makeDimensionSelector() accepted ExtractionFn, and then dimensionSpec.decorate() applied here to either result of virtualColumn.makeDimensionSelector() call or ConstantDimensionSelector.of(null, dimensionSpec.getExtractionFn()).
… review comments.
|
@leventov pushed a new patch, please have a look. |
| @Override | ||
| public int lookupId(String name) | ||
| { | ||
| if (value == null) { |
| return valueIds; | ||
| } | ||
|
|
||
| public static DimensionSelector constantSelector(final String value) |
There was a problem hiding this comment.
Please annotate the param as @Nullable
| } | ||
| } | ||
|
|
||
| public static DimensionSelector constantSelector(final String value, final ExtractionFn extractionFn) |
There was a problem hiding this comment.
Please annotate both params as @Nullable
This patch properly applies the requested extractionFn to missing columns.
It's important when the extractionFn maps null to something other than null.