Skip to content

Support string type in math expression#2836

Merged
dclim merged 3 commits intoapache:masterfrom
navis:support-string-expr
Nov 3, 2016
Merged

Support string type in math expression#2836
dclim merged 3 commits intoapache:masterfrom
navis:support-string-expr

Conversation

@navis
Copy link
Copy Markdown
Contributor

@navis navis commented Apr 14, 2016

Wanted to do something like __time > timestamp("2010-03-12T11:27:00) with math expr but currently it does not support that. This is the first try to support string type in math expr.

@navis navis force-pushed the support-string-expr branch 2 times, most recently from efe86e9 to 7d9feab Compare April 20, 2016 11:06
@navis navis force-pushed the support-string-expr branch from 7d9feab to 10cbe62 Compare May 19, 2016 00:58
@navis
Copy link
Copy Markdown
Contributor Author

navis commented May 19, 2016

rebased and refactored to be looked less awful

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Oct 25, 2016

@himanshug can you take a look?

Copy link
Copy Markdown
Contributor

@fjy fjy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a high level look and left high level comments

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as opposed to checking for type everywhere, can we use polymorphic deserialization instead? As in, we have different types of ExprEvals that are automatically instantiated based on Expr type?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with how this all works, it makes sense to have a type() method rather than polymorphism, since with polymorphism the value types would need to know how to operate on each other and that could get messy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call this class EvaluatedExpr as opposed to ExprEval?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this? It's trivial but makes so many conflicts in following PRs. We can discuss anytime later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, not a big deal

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you just return value here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class should use polymorphic deserialization to avoid so many type checks everywhere

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also avoid passing in type everywhere

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you throw IAE instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

escapeString?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's generated name by antlr

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does the type have to change here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expression recognizes this as long type and make long type result, which is different from arithmetic post aggregator which regards all numbers as double.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause Ut failures?

@navis navis force-pushed the support-string-expr branch from c2bd31d to f421015 Compare October 26, 2016 05:23
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Oct 26, 2016

@fjy did you mean something like this? (f421015)

dclim
dclim previously requested changes Oct 31, 2016
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for 'case' function which is not included in this PR. it seemed I've separated the part into another PR. I'll remove it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we need asLong() and asDouble() in addition to longValue() and doubleValue()? These aren't currently being used. Can StringExprEval just override longValue() and doubleValue() (and probably intValue() and numericValue() as well)? This would also avoid the ClassCastException you'd get right now calling say longValue() on a StringExprEval.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also having a method called asDouble() and one called doubleValue() is confusing to me as I'm not sure when I'd use each one, so I suggest consolidating them if possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If StringExprEval overrode longValue() and doubleValue() these wouldn't be necessary

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have three or four different version of cast function and this seemed the oldest one. I'll address that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to just call x.asString() which has the same logic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that is the difference of 'stringValue()' with 'asString()'. if user have checked the type of expr is X, then can use 'Xvalue()' to access the value without needing to check type again. Still would be better to remove them?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navis IMO if there is no significant performance hit, it feels better to consolidate them as it is confusing to me which one I should use without looking through the code, but I'm open for other suggestions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea, do we need stringValue() and asString()? Can we just use this logic (using String.valueOf()) in stringValue() instead of the typecast or is there performance implications?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has some historical reason for this but ok, I'll remove some methods.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Druid favors JodaTime over java.util.Date. Any reason we can't implement this method using JodaTime?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navis awesome

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well check that either 1 or 2 arguments were supplied instead of > 0

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IllegalArgumentException for consistency, needs 2 arguments

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is different behavior from SQL NVL which doesn't treat empty strings as null. If this behavior is required, would it make sense to make an additional function to handle the nullOrEmpty case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some following PR will make use of dimension as input of expression(, which is the main purpose of this PR). Then it's not that clear whether empty string should be considered as null or not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navis I was wrong, seems that Oracle does treat empty strings as null (http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements005.htm). I'm good with this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to see tests for the rest of the functions in Function.java if possible, or at least for the ones that were added (cast, unix_timestamp, nvl)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@dclim
Copy link
Copy Markdown
Contributor

dclim commented Nov 1, 2016

Also please add documentation for the new functions

@navis
Copy link
Copy Markdown
Contributor Author

navis commented Nov 2, 2016

Committed addressed patch first. I think I should squash these to rebase on #3630, which is really painful.

@gianm gianm added the Feature label Nov 2, 2016
@gianm gianm added this to the 0.9.3 milestone Nov 2, 2016
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 2, 2016

On @dclim's "please add documentation" comment, IMO, we don't need this right now.

None of the expression stuff is documented currently. I raised #3634 for this earlier today. We need to start those docs and pay down that debt before 0.9.3, but I think it doesn't need to be in this PR since the API is still in flux. I think it makes sense to work out the API details through further PRs and then write the docs.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 2, 2016

@navis, sorry about conflicts from #3630!

@dclim
Copy link
Copy Markdown
Contributor

dclim commented Nov 2, 2016

@gianm that sounds fine to me, but FYI there is a documentation page (but I don't think it's reachable from anywhere right now!): http://druid.io/docs/0.9.1.1/misc/math-expr.html

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 2, 2016

@dclim oops, I missed that. Nevermind, in that case I agree this PR should have some additions there. But let's keep it unlinked until we're happy with the API.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 2, 2016

btw, that conflict between #3630 and here makes me wonder how we should deal with null strings in the expression language. My first instinct (not having thought about it too much) is that we should take the attitude that Druid doesn't actually generate any nulls from its columns. I'm intending to do something similar in the druid-sql I'm working on, see https://groups.google.com/d/msg/druid-development/3npt9Qxpjr0/IeJQP0YXBQAJ. i.e. if the dimension selector or an extractionFn returns null then the expression system could treat that as "".

@navis navis force-pushed the support-string-expr branch 2 times, most recently from 86ed3ea to c53bdcc Compare November 2, 2016 02:41
@navis navis force-pushed the support-string-expr branch 3 times, most recently from f09cf16 to 048797e Compare November 2, 2016 02:53
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Nov 2, 2016

@gianm nevermind it (and it's my fault not checked related issues) and yes, null handling is really confusing and hard-to-fix issue in druid. in expression world, it's more confusing because null was not there in birth.
@dclim updated math-expr

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good to me, just a couple of minor comments for changes, and some thoughts on null handling we can follow up in #3645

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with how this all works, it makes sense to have a type() method rather than polymorphism, since with polymorphism the value types would need to know how to operate on each other and that could get messy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be x.type() == ExprType.STRING || y.type() == ExprType.STRING?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad. good catch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be very funny when someone forgets to override at least one of these 😄

Comment thread docs/content/misc/math-expr.md Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be unix_timestamp

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think this means all nulls, even nulls of columns we want to treat as numbers (e.g. missing columns), are getting represented as StringExprEvals. I guess that works but seems strange. It might make sense for null to have its own datatype.

I guess I don't have strong feelings on this now for this PR but we should consider null handling before finalizing the expressions API.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised issue in #3645 to consider this further.

@dclim
Copy link
Copy Markdown
Contributor

dclim commented Nov 2, 2016

@navis the point of the IAE and ISE exceptions is that it allows you to use formatted strings, so rather than doing:

throw new IAE("first argument should be string type but got " + value.type() + " type");

You can do:

throw new IAE("first argument should be string type but got %s type", value.type());

which is preferred in Druid.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stringValue() is only being used in unit tests now, if not necessary for a follow-on PR it can be removed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should just say needs 1 or 2 arguments, instead of 'at least'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dclim
Copy link
Copy Markdown
Contributor

dclim commented Nov 2, 2016

I'm good with this after the last few comments and Gian's comments have been addressed.

navis added 2 commits November 3, 2016 08:41
addressed comments

addressed comments

Addressed comments
@navis navis force-pushed the support-string-expr branch from 048797e to 7c93ccb Compare November 3, 2016 00:40
@navis navis force-pushed the support-string-expr branch from 7c93ccb to ef40017 Compare November 3, 2016 01:13
@gianm gianm closed this Nov 3, 2016
@gianm gianm reopened this Nov 3, 2016
Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

@dclim
Copy link
Copy Markdown
Contributor

dclim commented Nov 3, 2016

👍

@dclim dclim merged commit e10def3 into apache:master Nov 3, 2016
fundead pushed a commit to fundead/druid that referenced this pull request Dec 7, 2016
* Support string type in math expression

addressed comments

addressed comments

Addressed comments

* Updated math function document

* Addressed comments
dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017
* Support string type in math expression

addressed comments

addressed comments

Addressed comments

* Updated math function document

* Addressed comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants