Refactor ColumnSelectorFactory; Rely on ColumnValueSelector's polymorphism#4886
Conversation
|
This PR fails on tests in |
|
@leventov IIRC, yes Druid tries to be as flexible as much possible and |
|
@himanshug in the long term/general, what do you think about Druid to be "weakly typed", a-la MySQL, vs "strongly typed", a-la PostgreSQL? |
|
@leventov, re:
I commented on #4888 (comment) in more detail but wanted to add here that SchemaEvolutionTest's purpose is to make sure you can change types of columns for newer segments and have the query system handle that elegantly. It isn't flawless right now (for example you would probably expect a "sum" aggregator of a string column to parse the strings, rather than return zero) but it works well for dimensions.
I see it as being even weaker than MySQL on a table level. MySQL has a table-wide schema and I don't see Druid as ever requiring that. |
|
Could someone please review this PR? |
|
i'll try and review it tomorrow. |
|
@himanshug thank you |
| catch (Exception e) { | ||
| throw new ParseException(e, "Unable to parse metrics[%s], value[%s]", metric, metricValue); | ||
| if (metricValueString.charAt(0) == '+') { | ||
| if (metricValueString.length() > 1) { |
There was a problem hiding this comment.
could do if (metricValueString.length() > 1 && metricValueString.charAt(0) == '+') {.. and also eliminate the other if condition checking for metricValueString.isEmpty()
| } | ||
| } | ||
| return s; | ||
| } |
There was a problem hiding this comment.
why is this impl better than String.replace(char, '') ?
There was a problem hiding this comment.
'' is impossible in Java :)
There was a problem hiding this comment.
There was a problem hiding this comment.
newChar is char, '' is not a char
There was a problem hiding this comment.
ah... dint realize that. i'm surprised to know that there is no standard util to remove a character from a string :)
|
|
||
| @Nullable | ||
| ObjectColumnSelector makeObjectColumnSelector(String columnName); | ||
| ColumnValueSelector makeColumnValueSelector(String columnName); |
There was a problem hiding this comment.
this is definitely going to be incompatible for aggregator extensions. we need to mark ColumnSelectorFactory with @PublicApi
There was a problem hiding this comment.
Do you have custom impls of ColumnSelectorFactory? This interface is not extended in any extensions in extensions-core or contrib
There was a problem hiding this comment.
No-one is expected to extend ColumnSelectorFactory but it is used in custom extensions in implementations of AggregatorFactory.factorize/factorizedBuffered which would be calling removed methods like ColumnSelectorFactory.makeFloatColumnSelector(..) etc. That is why ColumnSelectorFactory needs to have @PublicApi and not @ExtensionPoint annotation.
There was a problem hiding this comment.
Added @PublicApi to this interface and some others. So the next released version of Druid needs to be 0.12, I think.
| private Union union; | ||
|
|
||
| public SketchAggregator(ObjectColumnSelector selector, int size) | ||
| public SketchAggregator(ColumnValueSelector selector, int size) |
There was a problem hiding this comment.
why is the type not BaseObjectColumnValueSelector as done in TimestampAggregator ?
| return new SketchAggregator(selector, size); | ||
| } | ||
| ColumnValueSelector selector = metricFactory.makeColumnValueSelector(fieldName); | ||
| return new SketchAggregator(selector, size); |
There was a problem hiding this comment.
null checks are removed, so in the new world it is guaranteed that for absent columns we get a non-null selector which would just return nulls/zeros from getXXX() methods ?
and I see you adjusted SketchAggregator to lazily create union so there shouldn't be any perf impacts there.
There was a problem hiding this comment.
null checks are removed, so in the new world it is guaranteed that for absent columns we get a non-null selector which would just return nulls/zeros from getXXX() methods ?
Yes, added some javadoc about this
| ObjectColumnSelector selector = metricFactory.makeObjectColumnSelector(fieldName); | ||
| if (selector == null) { | ||
| ColumnValueSelector<?> selector = metricFactory.makeColumnValueSelector(fieldName); | ||
| if (selector instanceof NilColumnValueSelector) { |
There was a problem hiding this comment.
so, this is the new way for checking absent column ?
| @Override | ||
| public void inspectRuntimeShape(RuntimeShapeInspector inspector) | ||
| { | ||
| // Usually AggregateCombiner has nothing to inspect |
There was a problem hiding this comment.
why is this not the default impl in AggregateCombiner then ?
| { | ||
| throw new UnsupportedOperationException("DimensionSelector cannot be operated as numeric ColumnValueSelector"); | ||
| // This is controversial, see https://github.com/druid-io/druid/issues/4888 | ||
| return 0.0f; |
There was a problem hiding this comment.
i thought we were gonna try and parse if possible.
There was a problem hiding this comment.
I preserved current behaviour. We can try to parse in a separate PR. Actually I tried to change this to parsing and it broke existing tests in SchemaEvolutionTest
| { | ||
| throw new UnsupportedOperationException("DimensionSelector cannot be operated as numeric ColumnValueSelector"); | ||
| // This is controversial, see https://github.com/druid-io/druid/issues/4888 | ||
| return 0.0; |
| { | ||
| throw new UnsupportedOperationException("DimensionSelector cannot be operated as object ColumnValueSelector"); | ||
| // This is controversial, see https://github.com/druid-io/druid/issues/4888 | ||
| return 0L; |
|
@himanshug addressed comments |
|
@leventov given that this is incompatible and we haven't released 0.11.0 , is it possible to backport this to 0.11.0 ? |
|
@himanshug thanks for review and merge, but this PR is tagged |
|
Anyway IMO it's too big and risky for backport at this point. I like the idea of making the next version 0.12 more, moreover I'm pretty sure there going to be more PRs that would break compatibility |
|
E. g. somebody may want to implement #4888 |
|
ok, next release gets to be 0.12.0 then. My apologies, I dint realize that you tagged it for additional reviews. |
Fixes #4800. Removes a lot of long/float/double/object code repetition across the project.