support for array expressions in TransformSpec with ExpressionTransform#8744
support for array expressions in TransformSpec with ExpressionTransform#8744gianm merged 13 commits intoapache:masterfrom
Conversation
changes: * added array expression support to transformSpec * removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning * hijacked index task test to test changes
| } | ||
|
|
||
| @PublicApi | ||
| public void verify(List<String> usedCols) |
There was a problem hiding this comment.
This verify method is useful for making sure that people's transforms, dimensions, metrics, etc are derived from fields they specified (to help detect errors & typos). But I see why you removed it -- info about transforms isn't available at this point in the code. It probably makes sense to add this functionality back in somehow, in a smarter way, once the dust settles on #8823.
/cc @jihoonson
| return row.getRaw(column); | ||
| Object raw = row.getRaw(column); | ||
| if (raw instanceof List) { | ||
| return ExpressionSelectors.coerceListDimToStringArray((List) raw); |
There was a problem hiding this comment.
Is it fair to assume that all Lists are Lists of Strings?
What if the expression selector returns a long array?
There was a problem hiding this comment.
Since the only multi-value columns we support are strings it is currently a safe assumption to make. I believe ExpressionTransform would either need an ValueType for it's input type, or it need to be inferred from the input data, for us to assume anything else. Would you like me to add a ValueType parameter to this method at least that we can hard code to string for now?
There was a problem hiding this comment.
This isn't a multi-value column, though, is it? It looks like it's something from an input row and it could be an array of anything. I was thinking that since expressions support numeric arrays, it'd be good to keep them that way if found in the input rows.
There was a problem hiding this comment.
Fair, I was thinking backwards when I wrote this comment. I'll see what I can do, the tricky bit seems to be converting the list to the appropriate array type.
| */ | ||
| public static Object coerceEvalToSelectorObject(ExprEval eval) | ||
| { | ||
| if (eval.isArray()) { |
There was a problem hiding this comment.
Is it fair to cast all arrays to lists of strings?
Do we still get the behavior we want if we cast them to lists of whatever they really are? (I'm thinking this could make us more future-proof to situations where we support numeric arrays at the storage layer.)
There was a problem hiding this comment.
i think if we just preserved the base type and covert to a list of that instead of converting it all to a list of strings, that it would still function more or less correctly because the values would be translated to strings further down the chain when reading into the row. Alternatively, an output ValueType could be specified to determine how to translate the evaluated expression? But I think just converting to a list as is might be better if it works ok.
There was a problem hiding this comment.
If converting to a list of the base type works, let's do that, since it's nicer.
My goal here (and w/ the other comment) is to avoid making too many places that encode a lists-should-always-be-strings assumption. We might want to expand that later.
| if (arrayVal.length > 0) { | ||
| return arrayVal; | ||
| if (val != null && val.size() > 0) { | ||
| Object firstElement = val.get(0); |
There was a problem hiding this comment.
Expressions support lists that contain nulls, right? What happens if the first element is null? Also, what happens if the first element is an Integer but later ones are Doubles? i.e. JSON [1, 2.0].
It may be better to examine all elements using some kind of binary type conversion rules (like long + int = long, etc).
There was a problem hiding this comment.
Updated to examine list elements and follow java auto conversion rules
|
This pull request introduces 1 alert when merging e8d7dde into a066cc5 - view on LGTM.com new alerts:
|
gianm
left a comment
There was a problem hiding this comment.
LGTM, appreciate the adjustments.
…rm (apache#8744) * transformSpec + array expressions changes: * added array expression support to transformSpec * removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning * hijacked index task test to test changes * remove docs about being unsupported * re-arrange test assert * unused imports * imports * fix tests * preserve types * suppress warning, fixes, add test * formatting * cleanup * better list to array type conversion and tests * fix oops
Fixes #8733 (and likely #8613, but untested in this PR)
Description
This PR adds array expression support to
transformSpec, filling in one of the remaining functionality gaps left from #7525.Additionally, this PR removes
ParseSpec.verifysince its' only use afaict was preventing transform expr that did not replace their input from functioning, because the implementations would complain about dimensions that were not mapped to a column, discovered when I was trying to add a test for the newly added array expression support ofExpressionTransform.Tagged with label release notes since in additional to fixing odd behavior with array expressions at indexing time,
ParseSpec.verifywas annotated@PublicApi, though since the only callers were the constructors of theParseSpecthat implemented the method, this shouldn't cause anything other than perhaps a complication issue for private extensions due to needing to remove@Overrideon theverifymethod.This PR has: