Windowed min aggregates null-s as 0#15371
Conversation
| * calling. "Polymorphism" of ObjectColumnSelector should be used only when operating on {@link ColumnValueSelector} | ||
| * objects. | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
Just to confirm this is not deprecated anymore ?
There was a problem hiding this comment.
I don't understand why it was marked like that
- the implemented interface methods can be deprecated - but they aren't
- it did provided a working implementation (no idea why...)
- it has the
@Deprecatedannotation for >5 years - should we really carry it forward?- is anyone committed to fix this deprecation?
- what are benefits of doing so?
- the apidoc's were a bit contradicting; but the `is convenient for implementation of "object-sourcing" seemed the most accurate.
- the class itself is not
@Deprecatedso anyone can just extend it and use it
because of the above I've just removed them...
I think instead of @Deprecated: if the implementation should reject calls to getLong and friends it should just throw an Exception in the default implementation explaining the situation - but even in that case it can't really be deprecated as this class is not the one declaring those methods.
There was a problem hiding this comment.
i think the main reason it was marked deprecated is so that intellij makes it pretty obvious if you're accidentally calling it on an object selector, its not really possible to remove it without reworking the column selector interfaces to split them up
| T value = getObject(); | ||
| if (value == null) { | ||
| return 0; | ||
| throw DruidException.defensive(COLUMN_IS_NULL_ERROR_MESSAGE); |
There was a problem hiding this comment.
Is it possible to add an unit test as mentioned in the issue
There was a problem hiding this comment.
I've extended windowed_long_null.sqlTest a bit to cover one there too - but the drill related cases were also been fixed by this change
somu-imply
left a comment
There was a problem hiding this comment.
Thanks for the changes, lets just rebase with master and I'll merge this
| public final boolean isNull() | ||
| { | ||
| return false; | ||
| return getObject() == null; |
There was a problem hiding this comment.
while I guess this is harmless, nothing calling getObject should ever be using isNull, instead they should be checking that getObject returns null
if this change is fixing something then whatever it is fixing is probably wrong
There was a problem hiding this comment.
that said, i guess we could change the contract of isNull to always work, its just also sort of pointless since getObject can just return the null (while getLong/getDouble/getFloat cannot since primitives), so why call two methods when you only need to call one
There was a problem hiding this comment.
I think this method should be either implemented correctly or throw an exception that its not supported - that will make sure that there won't be correctness issues if somehow it gets called
it is called from places like this - how should we fix that?
I don't understand clearly what you mean by "changing the contract of isNull" ; I believe that's defined here; and returning false have kinda violated it
There was a problem hiding this comment.
I think this method should be either implemented correctly or throw an exception that its not supported - that will make sure that there won't be correctness issues if somehow it gets called
That is what i was getting at, if getLong/getFloat/getDouble are going to throw exceptions, then this should too, since it should only be called if you are planning on calling one of those methods.
it is called from places like this - how should we fix that?
NullableNumericAggregator is only for things that are going to be calling getLong/getFloat/getDouble, which are the only things that are supposed to call isNull, because those methods return primitives which cannot be null. That aggregator and the things it wraps should never be using a selector derived from this one, since it is designed to aggregate primitive numeric values.
I don't understand clearly what you mean by "changing the contract of isNull" ; I believe that's defined here; and returning false have kinda violated it
This does not violate it, because no one should ever call getLong/getFloat/getDouble on that value selector, which means they should also not be using isNull, and should instead just call getObject and see if the result is null. Calling isNull if you intend to call getObject is violating the current contract.
There was a problem hiding this comment.
That is what i was getting at, if getLong/getFloat/getDouble are going to throw exceptions, then this should too, since it should only be called if you are planning on calling one of those methods.
I think it should throw an exception because null can't be represented as a primitive; it could highlight that you should call isNull first - I can revert that change; but I think those default values may only help to hide bugs....
[...] NullableNumericAggregator is only for things that are going to be calling [...]
that sounds rather odd - there are no interfaces enforcing that contract; it only requires that the selector implement BaseNullableColumnValueSelector ; and because of that object which implement that should work with it
This ObjectColumnSelector class provides getLong and friends based on getObject which is exactly how its services are used here
This does not violate it, because no one should ever call
"no one should ever call" : that can't be enforced; it
providing a return false implementation is wrong - and a violation of the contract in this case.
Calling isNull if you intend to call getObject is violating the contract.
now you lost me...then why implementColumnValueSelectorat all? or not just throw exceptions from all of them?
this seems like some kind of design issue...
I think a possible alternate way to fix this could be to not implement isNull in ObjectColumnSelector and force classes extending it to provide their own implementation - what do you think ?
There was a problem hiding this comment.
The current design is not great because it cannot enforce how callers use stuff, I agree completely and have some ambition of reworking how this stuff works to drop all of this polymorphism if possible, but its also a gigantic task since ColumnValueSelector is use in a ton of places, so it will take some time.
ObjectColumnSelector picks up BaseNullableColumnValueSelector because BaseLongColumnValueSelector etc implement it. It seems controversial to me that they provide default implementations of the get primitive methods at all because it seems like it is easy to misuse.
The javadocs of BaseNullableColumnValueSelector explain the current contract of isNull, and specify that things calling getObject should not call isNull because it is not for them.
I actually kind of think maybe ObjectColumnSelector should throw exceptions by default for getLong/getDouble/getFloat/isNull, because it isn't really intended to be used by things which should also maybe be long or float selectors, but i have no idea how many things that might break. For ColumnValueSelector` that do need to act as all types of selectors potentially, such as the example you linked:
This ObjectColumnSelector class provides getLong and friends based on getObject which is exactly how its services are used here
They should probably implement all of the methods themselves, because that is not that common of a pattern and only happens in some special cases. Usually ColumnValueSelector are specialized for the type they are selecting over.
There was a problem hiding this comment.
I think
ObjectColumnSelector and instead should implement ColumnValueSelector directly.
As far as I can tell, everything else using ObjectColumnSelector is truly something that will only call getObject like complex types of various sorts.
There was a problem hiding this comment.
I believe there are two ways to provide a ColumnValueSelector for Object -t:
- one which doesn't want to be treated like something which could answer a call like
getLong()orisNull()as it wants to enforce that people will use thegetObject()call - another one which provides the
getLongand other methods based on a backinggetObjectwhich could be even a primitive
The old ObjectColumnSelector was kinda tring to do both of the above; in the latest changeset I've modified it to simply throw an Exception for every getX call to prevent incorrect usage - I needed to make some changes to some tests as they were asserting on the isNull method (which was implemented by return false).
I've added ta second class which does the 2nd kind of thing named ObjectBasedColumnSelector - and its used in the windowing stuff and a few other places; I don't see much value of copy-pasting that implementation to every place as if you have an array of objects. If someone might want to call getLong() on it there is not really a alternate ways...if this ends up being slow or something - and identified as a performance bottleneck...we could roll ArrayBasedTypeColumnSelector classes via a strategy pattern and done...but right now I doubt that around windowing this would be the 1st bottleneck :D
| * calling. "Polymorphism" of ObjectColumnSelector should be used only when operating on {@link ColumnValueSelector} | ||
| * objects. | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
i think the main reason it was marked deprecated is so that intellij makes it pretty obvious if you're accidentally calling it on an object selector, its not really possible to remove it without reworking the column selector interfaces to split them up
…IndexerV4Test,AutoTypeColumnIndexerTest,ConcurrentGrouperTest,DoubleAnyAggregationTest,FastLineIteratorTest,FloatAnyAggregationTest,LongAnyAggregationTest,NestedDataColumnIndexerV4Test,ParallelCombinerTest
| * | ||
| * This abstract class provides a {@link ColumnValueSelector} based on the methods of {@link BaseObjectColumnValueSelector}. | ||
| */ | ||
| public abstract class ObjectBasedColumnSelector<T> implements ColumnValueSelector<T> |
There was a problem hiding this comment.
I think we need a better name for this because this isn't strictly an object selector like ObjectColumnSelector is, and instead since it also implements BaseLongColumnValueSelector, BaseDoubleColumnValueSelector,
BaseFloatColumnValueSelector and BaseNullableColumnValueSelector, so its sort of a multi-type selector. It is also important to call the contract of this class - that it doesn't do any real type coercion and only satisfies the numeric selectors if the value is already a Number.
| @Override | ||
| public final boolean isNull() | ||
| { | ||
| return getObject() == null; |
There was a problem hiding this comment.
I think this should also probably be checking that the result of getObject() is instanceof Number. The things that are going to call isNull() will do so in service of calling it prior to calling getLong/getFloat/getDouble, but if Object is not a Number then this will result in a class cast exception and there is no restriction on this class that the things it provides are always instances of a Number.
Things that are calling getObject should already know not to be checking isNull for null checks, so I don't think that would cause any problems if isNull returns true but getObject returns a non-null value.
There was a problem hiding this comment.
I'm not sure about the added value of such a check as the classcastexception will tell the story pretty nicely where/why the issue happened - but I've added some ifs
| * DoubleColumnSelector} solely in order to make {@link #isNull()} method final. | ||
| * Restricts selector usage to only allow {@link #getObject()}. | ||
| */ | ||
| public abstract class ObjectColumnSelector<T> implements ColumnValueSelector<T> |
There was a problem hiding this comment.
I think might have gone a bit overboard in removing the javadoc contract of this class, some of that stuff still seems relevant about only extending it
|
@clintropolis @soumyava should this have the bug fix label? |
|
Nevermind - it was just my use of ComplexColumn that was throwing it off. |
|
Hmmm…..this PR might be causing a regression from c353ccf#diff-1aa22bcd62ddd4457f128832de7d4a008b954d015d8b3be6a03e3b3471aad6e3R57 |
|
I feel like its a bit odd to have a Could you share a testcase to get a clearer understanding? |
|
Why is it odd for a complex column to support SUM? Could we get IncrementalIndex.makeMetricColumnValueSelector() to use the actual |
|
@kgyrtkirk |
|
I think it would be better to not provide a sketch and literal values at the same time from the same aggregator. I also suspect that these aggregators will not work well with the windowed codepaths - and in case of a windowed I was not able to uncover why the need for I think if there is a real need to have pure aliases to an existing aggregator - we should find an alternate way which doesn't duplicate class/types/etc. I feel like its very unfortunate that this was documented ; but not ensured to work with a test... I've spent some time trying to uncover what your expectations/options might be in this case - If you could write a testcase to get a better understanding - I think that will most likely speed up our communication speed to fix this. |
PR apache#15371 eliminated ObjectColumnSelector's built-in implementations of numeric methods, which had been marked deprecated. Some complex types, like SpectatorHistogram, can be successfully coerced to number. An alternative approach would be to restore (and un-deprecate) the built-in implementations of numeric methods in ObjectColumnSelector.
PR apache#15371 eliminated ObjectColumnSelector's built-in implementations of numeric methods, which had been marked deprecated. However, some complex types, like SpectatorHistogram, can be successfully coerced to number. The documentation for spectator histograms encourages taking advantage of this by aggregating complex columns with doubleSum and longSum. Currently, this doesn't work properly for IncrementalIndex, where the behavior relied on those deprecated ObjectColumnSelector methods. This patch fixes the behavior by making two changes: 1) SimpleXYZAggregatorFactory (XYZ = type; base class for simple numeric aggregators; all of these extend NullableNumericAggregatorFactory) use getObject for STRING and COMPLEX. Previously, getObject was only used for STRING. 2) NullableNumericAggregatorFactory (base class for simple numeric aggregators) has a new protected method "useGetObject". This allows the base class to correctly check for null (using getObject or isNull). The patch also adds a test for SpectatorHistogram + doubleSum + IncrementalIndex.
PR apache#15371 eliminated ObjectColumnSelector's built-in implementations of numeric methods, which had been marked deprecated. However, some complex types, like SpectatorHistogram, can be successfully coerced to number. The documentation for spectator histograms encourages taking advantage of this by aggregating complex columns with doubleSum and longSum. Currently, this doesn't work properly for IncrementalIndex, where the behavior relied on those deprecated ObjectColumnSelector methods. This patch fixes the behavior by making two changes: 1) SimpleXYZAggregatorFactory (XYZ = type; base class for simple numeric aggregators; all of these extend NullableNumericAggregatorFactory) use getObject for STRING and COMPLEX. Previously, getObject was only used for STRING. 2) NullableNumericAggregatorFactory (base class for simple numeric aggregators) has a new protected method "useGetObject". This allows the base class to correctly check for null (using getObject or isNull). The patch also adds a test for SpectatorHistogram + doubleSum + IncrementalIndex.
PR apache#15371 eliminated ObjectColumnSelector's built-in implementations of numeric methods, which had been marked deprecated. However, some complex types, like SpectatorHistogram, can be successfully coerced to number. The documentation for spectator histograms encourages taking advantage of this by aggregating complex columns with doubleSum and longSum. Currently, this doesn't work properly for IncrementalIndex, where the behavior relied on those deprecated ObjectColumnSelector methods. This patch fixes the behavior by making two changes: 1) SimpleXYZAggregatorFactory (XYZ = type; base class for simple numeric aggregators; all of these extend NullableNumericAggregatorFactory) use getObject for STRING and COMPLEX. Previously, getObject was only used for STRING. 2) NullableNumericAggregatorFactory (base class for simple numeric aggregators) has a new protected method "useGetObject". This allows the base class to correctly check for null (using getObject or isNull). The patch also adds a test for SpectatorHistogram + doubleSum + IncrementalIndex.
* Coerce COMPLEX to number in numeric aggregators. PR #15371 eliminated ObjectColumnSelector's built-in implementations of numeric methods, which had been marked deprecated. However, some complex types, like SpectatorHistogram, can be successfully coerced to number. The documentation for spectator histograms encourages taking advantage of this by aggregating complex columns with doubleSum and longSum. Currently, this doesn't work properly for IncrementalIndex, where the behavior relied on those deprecated ObjectColumnSelector methods. This patch fixes the behavior by making two changes: 1) SimpleXYZAggregatorFactory (XYZ = type; base class for simple numeric aggregators; all of these extend NullableNumericAggregatorFactory) use getObject for STRING and COMPLEX. Previously, getObject was only used for STRING. 2) NullableNumericAggregatorFactory (base class for simple numeric aggregators) has a new protected method "useGetObject". This allows the base class to correctly check for null (using getObject or isNull). The patch also adds a test for SpectatorHistogram + doubleSum + IncrementalIndex. * Fix tests. * Remove the special ColumnValueSelector. * Add test.
…6564) * Coerce COMPLEX to number in numeric aggregators. PR apache#15371 eliminated ObjectColumnSelector's built-in implementations of numeric methods, which had been marked deprecated. However, some complex types, like SpectatorHistogram, can be successfully coerced to number. The documentation for spectator histograms encourages taking advantage of this by aggregating complex columns with doubleSum and longSum. Currently, this doesn't work properly for IncrementalIndex, where the behavior relied on those deprecated ObjectColumnSelector methods. This patch fixes the behavior by making two changes: 1) SimpleXYZAggregatorFactory (XYZ = type; base class for simple numeric aggregators; all of these extend NullableNumericAggregatorFactory) use getObject for STRING and COMPLEX. Previously, getObject was only used for STRING. 2) NullableNumericAggregatorFactory (base class for simple numeric aggregators) has a new protected method "useGetObject". This allows the base class to correctly check for null (using getObject or isNull). The patch also adds a test for SpectatorHistogram + doubleSum + IncrementalIndex. * Fix tests. * Remove the special ColumnValueSelector. * Add test.
Correct the implementation of
ObjectColumnSelector#isNullFixes #15370
This PR has: