Skip to content

use object[] instead of string[] for vector expressions to be consistent with vector object selectors#13209

Merged
clintropolis merged 2 commits intoapache:masterfrom
clintropolis:better-vector-expr-string-processing
Oct 12, 2022
Merged

use object[] instead of string[] for vector expressions to be consistent with vector object selectors#13209
clintropolis merged 2 commits intoapache:masterfrom
clintropolis:better-vector-expr-string-processing

Conversation

@clintropolis
Copy link
Copy Markdown
Member

Description

Changes vectorized string expressions (such as concat) to deal in Object[] instead of String[], to be consistent with the behavior of VectorObjectSelector, as well as with a similar change I did recently to array expressions in #12914.

This fixes a class of bugs such as in the added test example, which would fail with an error of the form:

    Caused by: java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Ljava.lang.String;
	at org.apache.druid.math.expr.IdentifierExpr$4.evalVector(IdentifierExpr.java:198) ~[classes/:?]
	at org.apache.druid.math.expr.vector.StringOutMultiStringInVectorProcessor.evalVector(StringOutMultiStringInVectorProcessor.java:58) ~[classes/:?]
	at org.apache.druid.segment.virtual.ExpressionVectorObjectSelector.getObjectVector(ExpressionVectorObjectSelector.java:49) ~[classes/:?]

before the changes in this patch.


This PR has:
  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Copy link
Copy Markdown
Contributor

@cheddar cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks fine. I left a few comments where I think we could maybe do things to be a bit more defensive. In the worst case, though, these changes won't break new things, they just might not fix all of the things that they wanted to (i.e. in some cases, the exception might've just moved to be whack-a-moled a different day).

private static final class StringLookupVectorInputBindings implements Expr.VectorInputBinding
{
private final String[] currentValue = new String[1];
private final Object[] currentValue = new Object[1];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitish: The naming of this private static class doesn't really line up with its implementation anymore?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is still used exclusively for 'deferred' single string expressions, so maybe is ok?

{
// in sql compatible mode, nulls are handled by super class and never make it here...
return leftVal + rightVal;
return leftVal + (String) rightVal;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could call .toString() and avoid ClassCastExceptions here? Or, if it's important to do the cast, we should really do an instanceof check first so that we can generate a better error message than just ClassCastException

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as #13209 (comment) about why is safe

return;
} else if (rightNull) {
final boolean bool = Evals.asBoolean(leftInput[i]);
final boolean bool = Evals.asBoolean((String) leftInput[i]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, why is it safe to cast? do we need to check the type first? Or maybe check the type of the array that we are gotten, so we can amortize the cost of the check?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this safe because (for better or for worse) the base classes explicitly wrap their input processors with CastToTypeVectorProcessor.cast (which is a no-op if the processor is already the correct type) to ensure that it is being used with the correct type. I don't remember exactly why I did this, presumably because its important for vector processing that everything is the right type so we don't get ugly class cast exceptions, but I can look into if this is actually necessary or not.

@clintropolis clintropolis merged commit 59e2afc into apache:master Oct 12, 2022
@clintropolis clintropolis deleted the better-vector-expr-string-processing branch October 12, 2022 09:53
clintropolis added a commit to clintropolis/druid that referenced this pull request Oct 28, 2022
…ent with vector object selectors (apache#13209)

* use object[] instead of string[] for vector expressions to be consistent with vector object selectors

* simplify
@kfaraz kfaraz added this to the 24.0.1 milestone Nov 1, 2022
kfaraz pushed a commit that referenced this pull request Nov 1, 2022
* use object[] instead of string[] for vector expressions to be consistent with vector object selectors (#13209)

* fix issue with nested column null value index incorrectly matching non-null values (#13211)

* fix json_value sql planning with decimal type, fix vectorized expression math null value handling in default mode (#13214)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants