Frame writers: Coerce numeric and array types in certain cases.#16994
Frame writers: Coerce numeric and array types in certain cases.#16994gianm merged 9 commits intoapache:masterfrom
Conversation
This mirrors similar logic in numeric aggregators. (The same method from AggregatorUtil is even used to determine when to apply the logic.) The idea is that when an underlying selector is STRING or COMPLEX typed, we should call getObject and cast the result to number, rather than using the primitive numeric accessor methods. This fixes an issue where a column would be read as all-zeroes when its SQL type is numeric, and its physical type is string. This can happen when evolving a column's type from string to number.
| public double getDouble() | ||
| { | ||
| final Number n = computeIfNeeded(); | ||
| return n == null ? NullHandling.ZERO_DOUBLE : n.floatValue(); |
There was a problem hiding this comment.
should this call n.doubleValue()?
There was a problem hiding this comment.
I figured since this is the "to float" selector, it should use float precision even if the caller is requesting a double.
Fwiw, the caller really shouldn't be requesting a double anyway (since it's a "to float" selector). I have tests for this method but I don't expect it to be called.
| return ((Number) obj).doubleValue(); | ||
| } else { | ||
| final ExprEval<?> eval = ExprEval.bestEffortOf(obj); | ||
| return eval.isNumericNull() ? null : eval.asDouble(); |
There was a problem hiding this comment.
Should we through a parse exception here ?
Going though the exprEval code, it seems that if a string expression eval is returned and its not a valid double, then a null is returned.
Shoudn't we throw an exception and force the user to add a cast to is sql statement ?
There was a problem hiding this comment.
After discussion with @clintropolis , the automatic type conversion is used all over the processing stack and the ingestion stack. Hence, parseException may not be correct.
There was a problem hiding this comment.
Yeah, this sort of coercion behavior is standard in the query stack when reading from selectors that don't match the desired type. It's how we accomplish schema evolution.
|
|
||
| switch (desiredType.getType()) { | ||
| case LONG: | ||
| return new ObjectToLongColumnValueSelector(selector, rowIdSupplier); |
There was a problem hiding this comment.
it might be a little bit nicer to get the column capabilities from the selector factory so we know the 'input' column type and pass that into these ObjectToType wrapper selectors so that they could use ExprEval.ofType instead of ExprEval.bestEffortOf which should be a bit more efficient since the latter is mostly these days reserved for uses where we do not know the input type
There was a problem hiding this comment.
That makes sense, I'll change it.
|
Updated PR to:
|
| { | ||
| final Object obj = selector.getObject(); | ||
| if (obj == null) { | ||
| return ExprEval.of(null); |
There was a problem hiding this comment.
nit: could use ExprEval.ofType(selectorType, null) so it doesn't become a ExpressionType.STRING, but also probably doesn't matter
There was a problem hiding this comment.
It still does become a string if we do that, b/c of how the code flow works through ExprEval.ofType. Maybe it shouldn't? But it does.
There was a problem hiding this comment.
if type is null it becomes a string, but if the value is null it is an ExprEval of that type with a null value
the object from the row adapter.
|
I had to expand the scope of this patch to fix a failing test case in
|
|
I just pushed a new approach that no longer modifies When merging this patch, please use the PR title + description for the commit message. Letting GitHub use the list of commits for the message is going to be too confusing, since the approach changed a bunch of times. |
| } else if (type == ValueType.ARRAY) { | ||
| final TypeSignature<ValueType> elementType = desiredType.getElementType(); | ||
|
|
||
| if (obj instanceof List) { |
There was a problem hiding this comment.
you could use ExprEval.bestEffortArray here i suppose, though ExprEval.ofType if desiredType is an array, as well as ExprEval.bestEffortOf should have pretty similar logic.
There was a problem hiding this comment.
See #16994 (comment), I pushed a new commit that replaces this method with ExprEval.ofType.
| * @param desiredType desired type | ||
| */ | ||
| @Nullable | ||
| public static Object bestEffortCoerce( |
There was a problem hiding this comment.
this method seems odd to me, it is basicaly like ExprEval.ofType which separates handling based on the expected ExpressionType (instead of ColumnType like is used here), but then uses ExprEval.bestEffortOf inside of the cases (which uses a much larger set of instanceof checks than ExprEval.ofType).
What sets this method apart from the others?
There was a problem hiding this comment.
I pushed a new commit that replaces this method with ExprEval.ofType. It seems to do what we need. I didn't try it at first, because I misunderstood what it does— I didn't realize it also included coercion. I added javadoc to ExprEval.ofType explaining that it does do coercion.
| /** | ||
| * Create an eval of the provided type. Coerces the provided object to the desired type. | ||
| * | ||
| * @param type type, or null to be equivalent to {@link #bestEffortOf(Object)} | ||
| * @param value object to be coerced to the type | ||
| */ |
This patch adds
TypeCastSelectors, which is used when writing frames to perform two coercions:When a numeric type is desired and the underlying type is non-numeric or unknown, the underlying selector is wrapped,
getObjectis called and the result is coerced usingExprEval.ofType. This differs from the prior behavior where the primitive methods likegetLong,getDouble, etc, would be called directly. This fixes an issue where a column would be read as all-zeroes when its SQL type is numeric and its physical type is string, which can happen when evolving a column's type from string to number.When an array type is desired, the underlying selector is wrapped,
getObjectis called, and the result is coerced toObject[]. This coercion replaces some earlier logic from Numeric array support for columnar frames #15917.