Skip to content

support for array expressions in TransformSpec with ExpressionTransform#8744

Merged
gianm merged 13 commits intoapache:masterfrom
clintropolis:transform-spec-multi-value
Nov 13, 2019
Merged

support for array expressions in TransformSpec with ExpressionTransform#8744
gianm merged 13 commits intoapache:masterfrom
clintropolis:transform-spec-multi-value

Conversation

@clintropolis
Copy link
Copy Markdown
Member

Fixes #8733 (and likely #8613, but untested in this PR)

Description

This PR adds array expression support to transformSpec, filling in one of the remaining functionality gaps left from #7525.

Additionally, this PR removes ParseSpec.verify since its' only use afaict was preventing transform expr that did not replace their input from functioning, because the implementations would complain about dimensions that were not mapped to a column, discovered when I was trying to add a test for the newly added array expression support of ExpressionTransform.

Tagged with label release notes since in additional to fixing odd behavior with array expressions at indexing time, ParseSpec.verify was annotated @PublicApi, though since the only callers were the constructors of the ParseSpec that implemented the method, this shouldn't cause anything other than perhaps a complication issue for private extensions due to needing to remove @Override on the verify method.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths.

changes:
* added array expression support to transformSpec
* removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning
* hijacked index task test to test changes
}

@PublicApi
public void verify(List<String> usedCols)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This verify method is useful for making sure that people's transforms, dimensions, metrics, etc are derived from fields they specified (to help detect errors & typos). But I see why you removed it -- info about transforms isn't available at this point in the code. It probably makes sense to add this functionality back in somehow, in a smarter way, once the dust settles on #8823.

/cc @jihoonson

return row.getRaw(column);
Object raw = row.getRaw(column);
if (raw instanceof List) {
return ExpressionSelectors.coerceListDimToStringArray((List) raw);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fair to assume that all Lists are Lists of Strings?

What if the expression selector returns a long array?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the only multi-value columns we support are strings it is currently a safe assumption to make. I believe ExpressionTransform would either need an ValueType for it's input type, or it need to be inferred from the input data, for us to assume anything else. Would you like me to add a ValueType parameter to this method at least that we can hard code to string for now?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a multi-value column, though, is it? It looks like it's something from an input row and it could be an array of anything. I was thinking that since expressions support numeric arrays, it'd be good to keep them that way if found in the input rows.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I was thinking backwards when I wrote this comment. I'll see what I can do, the tricky bit seems to be converting the list to the appropriate array type.

*/
public static Object coerceEvalToSelectorObject(ExprEval eval)
{
if (eval.isArray()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fair to cast all arrays to lists of strings?

Do we still get the behavior we want if we cast them to lists of whatever they really are? (I'm thinking this could make us more future-proof to situations where we support numeric arrays at the storage layer.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if we just preserved the base type and covert to a list of that instead of converting it all to a list of strings, that it would still function more or less correctly because the values would be translated to strings further down the chain when reading into the row. Alternatively, an output ValueType could be specified to determine how to translate the evaluated expression? But I think just converting to a list as is might be better if it works ok.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If converting to a list of the base type works, let's do that, since it's nicer.

My goal here (and w/ the other comment) is to avoid making too many places that encode a lists-should-always-be-strings assumption. We might want to expand that later.

if (arrayVal.length > 0) {
return arrayVal;
if (val != null && val.size() > 0) {
Object firstElement = val.get(0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expressions support lists that contain nulls, right? What happens if the first element is null? Also, what happens if the first element is an Integer but later ones are Doubles? i.e. JSON [1, 2.0].

It may be better to examine all elements using some kind of binary type conversion rules (like long + int = long, etc).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to examine list elements and follow java auto conversion rules

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Nov 13, 2019

This pull request introduces 1 alert when merging e8d7dde into a066cc5 - view on LGTM.com

new alerts:

  • 1 for Unused format argument

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, appreciate the adjustments.

@gianm gianm merged commit cc54b2a into apache:master Nov 13, 2019
@clintropolis clintropolis deleted the transform-spec-multi-value branch November 13, 2019 20:29
clintropolis added a commit to implydata/druid-public that referenced this pull request Nov 15, 2019
…rm (apache#8744)

* transformSpec + array expressions

changes:
* added array expression support to transformSpec
* removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning
* hijacked index task test to test changes

* remove docs about being unsupported

* re-arrange test assert

* unused imports

* imports

* fix tests

* preserve types

* suppress warning, fixes, add test

* formatting

* cleanup

* better list to array type conversion and tests

* fix oops
@jon-wei jon-wei added this to the 0.17.0 milestone Dec 17, 2019
@jon-wei jon-wei mentioned this pull request Jan 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cell contains [LJava.lang.String@ instead of data when constructing array via transformSpec during ingestion

3 participants