Skip to content

Fix dimension selectors with extractionFns on missing columns.#4717

Merged
leventov merged 7 commits intoapache:masterfrom
gianm:fix-null-selector-extractionFn
Aug 25, 2017
Merged

Fix dimension selectors with extractionFns on missing columns.#4717
leventov merged 7 commits intoapache:masterfrom
gianm:fix-null-selector-extractionFn

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Aug 24, 2017

This patch properly applies the requested extractionFn to missing columns.
It's important when the extractionFn maps null to something other than null.

This patch properly applies the requested extractionFn to missing columns.
It's important when the extractionFn maps null to something other than null.
@gianm gianm added the Bug label Aug 24, 2017
@gianm gianm added this to the 0.11.0 milestone Aug 24, 2017
}

if (selector.nameLookupPossibleInAdvance()
&& selector.getValueCardinality() == 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extract this into an utility function "isNullSelector"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

import java.util.Objects;

public class NullDimensionSelector implements SingleValueHistoricalDimensionSelector, IdLookup
public class ConstantDimensionSelector implements SingleValueHistoricalDimensionSelector, IdLookup
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to keep NullDimensionSelector. I think it's not a big deal, however it could make a difference for performance somewhere (reading a field vs. constant null)

Copy link
Copy Markdown
Contributor Author

@gianm gianm Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but then thought it wasn't worth the additional code since the only potentially hot methods affected are lookupId and lookupName, both of which are already incredibly cheap in ConstantDimensionSelector relative to a "real" dimension selector. And I didn't think it was worth having an entire extra file just for a probably very slight optimization for columns that don't exist.

I can add it if you still think it's worth it, but that was my reasoning.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI #4623 allows to reduce code duplication in ColumnSelectorFactory, GenericColumn, etc. implementations, by leaving just makeColumnValueSelector() instead of makeLong/Float/Double/Object/ColumnSelector(). This will likely require to introduce NilColumnValueSelector, returning 0 from getLong/Float/Double, null from get(), Void from classOfObject(), and also implementing DimensionSelector exactly as currently NullDimensionSelector does.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, but what do you think the implications are for this patch?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That NullDimensionSelector will be revived and removing it now will only break git history for class.

But leaving this to you and this ask is not blocking for this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I will add it back in this PR, since it sounds like it will come back eventually anyway.

} else {
final DimensionSelector selector = virtualColumn.makeDimensionSelector(dimensionSpec, factory);
return selector == null ? dimensionSpec.decorate(NullDimensionSelector.instance()) : selector;
if (selector == null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this logic be inside virtualColumn.makeDimensionSelector()? So that dimensionSpec.decorate() is applied just at one site?

Copy link
Copy Markdown
Contributor Author

@gianm gianm Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only applied if virtualColumn.makeDimensionSelector returns null. The contract of the method is that virtualColumn.makeDimensionSelector can return actual null if a selector can't be made for some reason (maybe the selector type isn't supported), and in this case we must decorate a null selector.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want dimensionSpec.decorate() to be consistently applied at one place. For example, virtualColumn.makeDimensionSelector() accepted ExtractionFn, and then dimensionSpec.decorate() applied here to either result of virtualColumn.makeDimensionSelector() call or ConstantDimensionSelector.of(null, dimensionSpec.getExtractionFn()).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind the contract of virtualColumn.makeDimensionSelector is that it's responsible for applying decoration, and that may give it the power to potentially optimize something in some way (maybe it's aware of certain kinds of DimensionSpecs). Currently, no implementation in core actually uses this power, so it could be taken away if we want and only applied "outside". But the idea was that maybe someone would find a creative use for it.

What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then returning to the original suggestion, why couldn't virtualColumn.makeDimensionSelector() return NullDimensionSelector itself (never returns null)?

Copy link
Copy Markdown
Contributor Author

@gianm gianm Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but then we will still need to apply decoration in two places, since VirtualColumns.makeDimensionSelector might still be called with a column name that doesn't exist, and it would have to build and decorate its own column.

However, now that I think about it, we could get around that by forbidding users from calling VirtualColumns.makeDimensionSelector with a column that doesn't exist. No current callers do it anyway. So I can make that change, and make VirtualColumn.makeDimensionSelector non-nullable, if that sounds good.

}
}

private static boolean isNullSelector(final DimensionSelector selector)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the parameter should be marked @Nullable, but I think we should avoid this, because null and "a selector object indistinguishable from NullDimensionSelector" are different things. So suggested for isNullSelector() to accept only non-null objects, and leave == null check outside.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting changing the call site to,

if (selector != null && !isNullSelector(selector)) {
  // Do stuff
}

Instead of,

if (!isNullSelector(selector)) {
  // Do stuff
}

I can make the change, but don't see why one is better than the other in this case… could you please explain your thinking?

Copy link
Copy Markdown
Member

@leventov leventov Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple principle that applies here is that we should minimize nullability of method parameters and return types. It's a source of bugs: selector could be null for a good reason, and could be null by mistake (e. g. uninitialized variable). And even if selector is null for a good reason, accepting it makes the utility method less specific, because there could be variable potential meanings and treatments of selector as null.

Most of this doesn't apply right here, but if this method is used elsewhere and made public, that is very probable, it will apply.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this method has exactly one caller, and since it's private in SearchQueryRunner.java it's not going to have more than that unless it's extracted somewhere else. I guess you're saying that someone might want to extract this method to a common utility class in the future, and just in case that happens, we should make the arg non-nullable so the future person is not tempted to create a common utility method with a nullable arg… since in turn that would tempt callers to not handle nulls appropriately.

I guess I can see that point. It seems too theoretical to make it a hard rule in Druid, but I'll do it in this case since we have already spent so much time talking about it. Although in general I think it's fine to be looser with guidelines like this for private helper functions vs. public helper functions, and stricter at the point where someone actually does make it public, if ever.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed new helper function.

} else {
final DimensionSelector selector = virtualColumn.makeDimensionSelector(dimensionSpec, factory);
return selector == null ? dimensionSpec.decorate(NullDimensionSelector.instance()) : selector;
if (selector == null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want dimensionSpec.decorate() to be consistently applied at one place. For example, virtualColumn.makeDimensionSelector() accepted ExtractionFn, and then dimensionSpec.decorate() applied here to either result of virtualColumn.makeDimensionSelector() call or ConstantDimensionSelector.of(null, dimensionSpec.getExtractionFn()).

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Aug 25, 2017

@leventov pushed a new patch, please have a look.

@Override
public int lookupId(String name)
{
if (value == null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch is not needed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

return valueIds;
}

public static DimensionSelector constantSelector(final String value)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please annotate the param as @Nullable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

}
}

public static DimensionSelector constantSelector(final String value, final ExtractionFn extractionFn)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please annotate both params as @Nullable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@leventov leventov merged commit 43488df into apache:master Aug 25, 2017
@gianm gianm deleted the fix-null-selector-extractionFn branch September 23, 2022 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants