Skip to content

Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism#4886

Merged
himanshug merged 17 commits intoapache:masterfrom
metamx:refactor-column-selector-factory
Oct 14, 2017
Merged

Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism#4886
himanshug merged 17 commits intoapache:masterfrom
metamx:refactor-column-selector-factory

Conversation

@leventov
Copy link
Copy Markdown
Member

@leventov leventov commented Oct 2, 2017

Fixes #4800. Removes a lot of long/float/double/object code repetition across the project.

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Oct 2, 2017

This PR fails on tests in SchemaEvolutionTest. According to them, numeric aggregation (e. g. longSum) on string dimension column should result to zeros, but currently with this PR it leads to exception. @himanshug @gianm what do you think it should do?

@himanshug
Copy link
Copy Markdown
Contributor

@leventov IIRC, yes Druid tries to be as flexible as much possible and longSum works on string columns if they are parsable to long values or else assumed zeros. typically aggregators try to accommodate different column types if possible. i think we should try and preserve the behavior.

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Oct 2, 2017

@himanshug in the long term/general, what do you think about Druid to be "weakly typed", a-la MySQL, vs "strongly typed", a-la PostgreSQL?

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Oct 2, 2017

#4888

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Oct 2, 2017

@leventov, re:

This PR fails on tests in SchemaEvolutionTest. According to them, numeric aggregation (e. g. longSum) on string dimension column should result to zeros, but currently with this PR it leads to exception. @himanshug @gianm what do you think it should do?

I commented on #4888 (comment) in more detail but wanted to add here that SchemaEvolutionTest's purpose is to make sure you can change types of columns for newer segments and have the query system handle that elegantly. It isn't flawless right now (for example you would probably expect a "sum" aggregator of a string column to parse the strings, rather than return zero) but it works well for dimensions.

in the long term/general, what do you think about Druid to be "weakly typed", a-la MySQL, vs "strongly typed", a-la PostgreSQL?

I see it as being even weaker than MySQL on a table level. MySQL has a table-wide schema and I don't see Druid as ever requiring that.

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Oct 5, 2017

Could someone please review this PR?

@himanshug
Copy link
Copy Markdown
Contributor

i'll try and review it tomorrow.

@leventov
Copy link
Copy Markdown
Member Author

leventov commented Oct 5, 2017

@himanshug thank you

catch (Exception e) {
throw new ParseException(e, "Unable to parse metrics[%s], value[%s]", metric, metricValue);
if (metricValueString.charAt(0) == '+') {
if (metricValueString.length() > 1) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do if (metricValueString.length() > 1 && metricValueString.charAt(0) == '+') {.. and also eliminate the other if condition checking for metricValueString.isEmpty()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
}
return s;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this impl better than String.replace(char, '') ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'' is impossible in Java :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newChar is char, '' is not a char

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah... dint realize that. i'm surprised to know that there is no standard util to remove a character from a string :)


@Nullable
ObjectColumnSelector makeObjectColumnSelector(String columnName);
ColumnValueSelector makeColumnValueSelector(String columnName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely going to be incompatible for aggregator extensions. we need to mark ColumnSelectorFactory with @PublicApi

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have custom impls of ColumnSelectorFactory? This interface is not extended in any extensions in extensions-core or contrib

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No-one is expected to extend ColumnSelectorFactory but it is used in custom extensions in implementations of AggregatorFactory.factorize/factorizedBuffered which would be calling removed methods like ColumnSelectorFactory.makeFloatColumnSelector(..) etc. That is why ColumnSelectorFactory needs to have @PublicApi and not @ExtensionPoint annotation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added @PublicApi to this interface and some others. So the next released version of Druid needs to be 0.12, I think.

private Union union;

public SketchAggregator(ObjectColumnSelector selector, int size)
public SketchAggregator(ColumnValueSelector selector, int size)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the type not BaseObjectColumnValueSelector as done in TimestampAggregator ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

return new SketchAggregator(selector, size);
}
ColumnValueSelector selector = metricFactory.makeColumnValueSelector(fieldName);
return new SketchAggregator(selector, size);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null checks are removed, so in the new world it is guaranteed that for absent columns we get a non-null selector which would just return nulls/zeros from getXXX() methods ?
and I see you adjusted SketchAggregator to lazily create union so there shouldn't be any perf impacts there.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null checks are removed, so in the new world it is guaranteed that for absent columns we get a non-null selector which would just return nulls/zeros from getXXX() methods ?

Yes, added some javadoc about this

ObjectColumnSelector selector = metricFactory.makeObjectColumnSelector(fieldName);
if (selector == null) {
ColumnValueSelector<?> selector = metricFactory.makeColumnValueSelector(fieldName);
if (selector instanceof NilColumnValueSelector) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this is the new way for checking absent column ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@Override
public void inspectRuntimeShape(RuntimeShapeInspector inspector)
{
// Usually AggregateCombiner has nothing to inspect
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this not the default impl in AggregateCombiner then ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

{
throw new UnsupportedOperationException("DimensionSelector cannot be operated as numeric ColumnValueSelector");
// This is controversial, see https://github.com/druid-io/druid/issues/4888
return 0.0f;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought we were gonna try and parse if possible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preserved current behaviour. We can try to parse in a separate PR. Actually I tried to change this to parsing and it broke existing tests in SchemaEvolutionTest

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good then

{
throw new UnsupportedOperationException("DimensionSelector cannot be operated as numeric ColumnValueSelector");
// This is controversial, see https://github.com/druid-io/druid/issues/4888
return 0.0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

{
throw new UnsupportedOperationException("DimensionSelector cannot be operated as object ColumnValueSelector");
// This is controversial, see https://github.com/druid-io/druid/issues/4888
return 0L;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@leventov
Copy link
Copy Markdown
Member Author

@himanshug addressed comments

@himanshug himanshug merged commit dc7cb11 into apache:master Oct 14, 2017
@himanshug
Copy link
Copy Markdown
Contributor

@leventov given that this is incompatible and we haven't released 0.11.0 , is it possible to backport this to 0.11.0 ?

@leventov
Copy link
Copy Markdown
Member Author

@himanshug thanks for review and merge, but this PR is tagged Design Review so it needed another one review from a committer

@leventov
Copy link
Copy Markdown
Member Author

Anyway IMO it's too big and risky for backport at this point. I like the idea of making the next version 0.12 more, moreover I'm pretty sure there going to be more PRs that would break compatibility

@leventov
Copy link
Copy Markdown
Member Author

E. g. somebody may want to implement #4888

@himanshug
Copy link
Copy Markdown
Contributor

ok, next release gets to be 0.12.0 then. My apologies, I dint realize that you tagged it for additional reviews.

@leventov leventov deleted the refactor-column-selector-factory branch October 18, 2017 18:24
@gianm gianm added this to the 0.12.0 milestone Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants