Skip to content

Math expressional parameters for aggregator#2783

Merged
himanshug merged 3 commits intoapache:masterfrom
navis:expression-paramed-aggregator
Oct 19, 2016
Merged

Math expressional parameters for aggregator#2783
himanshug merged 3 commits intoapache:masterfrom
navis:expression-paramed-aggregator

Conversation

@navis
Copy link
Copy Markdown
Contributor

@navis navis commented Apr 4, 2016

Someone wanted druid to support expressional parameter for aggregator something like sum(a + b / c). I don't know why it's needed either but this is it. Will make propose in dev mailing list soon after.

Based on great work(#2090) of @himanshug.

@navis navis force-pushed the expression-paramed-aggregator branch 2 times, most recently from 44280df to 1c5721f Compare April 5, 2016 02:57
@navis navis force-pushed the expression-paramed-aggregator branch 3 times, most recently from 037e526 to 303d7f7 Compare April 12, 2016 15:38
@navis navis changed the title [WIP] Math expressional parameters for aggregator Math expressional parameters for aggregator Apr 14, 2016
@julianhyde
Copy link
Copy Markdown

Do you allow quoting of column names? Are column names case-sensitive? I suggest you make the syntax as close to SQL as possible. That would also mean supporting quoting, and also using 'iif' rather than 'if'.

@navis
Copy link
Copy Markdown
Contributor Author

navis commented Apr 14, 2016

@julianhyde Currently it does not allow quoting and identifiers are case-sensitive. I've used double quote for string literal in #2836 but it would be better to change it single quote. I'll handle your comments in #2836.

@julianhyde
Copy link
Copy Markdown

@navis Thanks! We know that this expression language is going to be popular, and then we're going to wish that it as close to SQL as we can manage, otherwise we'll spend years trying to undo the differences.

@xvrl
Copy link
Copy Markdown
Member

xvrl commented Apr 14, 2016

@julianhyde we made the decision back in Druid 0.7 to have column names be case-sensitive to keep things simple implementation-wise. As far as I know SQL doesn't prescribe things to be case-insensitive for column names. For instance PostgreSQL is case-sensitive, but has it's SQL layer fold column names to lower-case, if not quoted. I think it would be worthwhile to have a similar behavior for Druid where any case-folding is left to the SQL layer doing the query translation.

@navis
Copy link
Copy Markdown
Contributor Author

navis commented Apr 15, 2016

@xvrl didn't changed case-sensitivity of identifiers and have no intention to do that, but possibly make an option for it.

@julianhyde
Copy link
Copy Markdown

It's totally valid and workable to have case-sensitive identifiers and leave their case intact (i.e. not convert them to lower case as PostgreSQL does, or to upper case as Oracle does).

The only other thing is spaces in column names. If you allow space (and other punctuation) in column names then you need a way to use these in expressions. You might use double quotes (as Oracle and PostgreSQL), brackets (as SQL Server) or back-quotes (as MySQL).

@navis navis force-pushed the expression-paramed-aggregator branch from a0dfa51 to 8b645c3 Compare April 20, 2016 02:15
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Apr 20, 2016

I think this is duplicate of #1965. @himanshug could you review this?

@navis navis force-pushed the expression-paramed-aggregator branch from 8b645c3 to 7beac62 Compare April 20, 2016 02:34
@himanshug
Copy link
Copy Markdown
Contributor

@navis yes, i planned to add expression language to aggregators and post-aggregators in #1965 , was waiting for expression thingie to get merged.
i see you have multiple PRs in flight on same , i will review these as soon as i can.

@navis navis force-pushed the expression-paramed-aggregator branch 2 times, most recently from f972078 to 83f93fe Compare May 4, 2016 03:42
@navis navis force-pushed the expression-paramed-aggregator branch 2 times, most recently from 6a27b92 to d1367e8 Compare May 19, 2016 06:11
This was referenced May 24, 2016
@navis navis force-pushed the expression-paramed-aggregator branch from d1367e8 to 071eec7 Compare July 20, 2016 03:20
@fjy fjy added this to the 0.9.3 milestone Sep 13, 2016
@fjy fjy added the Feature label Sep 13, 2016
@navis navis force-pushed the expression-paramed-aggregator branch from 071eec7 to a71bd01 Compare September 21, 2016 02:24
@navis navis force-pushed the expression-paramed-aggregator branch from a71bd01 to 2df3d6c Compare September 21, 2016 02:25
@navis
Copy link
Copy Markdown
Contributor Author

navis commented Sep 21, 2016

Squashed and merged with #2820, caused by hard-maintenance for long abandoned time.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@navis, thanks for the patch, looks great modulo some comments.

@himanshug, you started the original math expression, are you interested in taking a look at this patch? This would also be the first time the math expression stuff is actually exposed as an external API - do you have any concerns?

We will need documentation for expressions at some point, but I think it's ok to merge this without that, and leave it as an undocumented feature for now in master. Ideally before 0.9.3 we will write up some docs, firm up the API, and make this a documented feature.

Copy link
Copy Markdown
Contributor

@gianm gianm Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this would be nicer as a method on Expr like List<String> requiredBindings(). instanceof generally feels brittle and better to avoid if possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced with visitor, which seemed more elegant

Copy link
Copy Markdown
Contributor

@gianm gianm Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest naming this objectBindings

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. no one will use druid with javafx.

Copy link
Copy Markdown
Contributor

@gianm gianm Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest naming this supplierBindings

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest calling this "expression" to be consistent with the aggregators' "fieldExpression"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to verify here that exactly one of fieldName / fieldExpression is set, but not both.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great method :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else throw exception?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there was double type here and looks like I removed them altogether by mistake. This would be rewritten with #2836, as commented above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the intent is missing/mistyped columns will be bound as null. Does the math expression library handle that well? what happens exactly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll throw NPE in evaluation. MathExpressionSelector should be rewritten with following PR #2836, anyway

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that, null is handled as null string and translated to 0 whenever it's needed to be numeric type.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null -> druid-defaults (empty string or 0) sounds good to me

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw UnsupportedOperationException preferred to returning null IMO.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All others returns null. It's named DUMMY_COLUMN_SELECTOR_FACTORY

Copy link
Copy Markdown
Contributor

@gianm gianm Sep 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could take Expr to avoid parsing overhead for each segment's query runner.

@himanshug
Copy link
Copy Markdown
Contributor

did a quick scan, overall looks fine to me... can give a more careful look next week.
i'll be happy to see the expressions in to action finally.

@erikdubbelboer
Copy link
Copy Markdown
Contributor

Will there also be some documentation updates for this?

@navis navis force-pushed the expression-paramed-aggregator branch from 2df3d6c to a450350 Compare October 10, 2016 06:16
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Oct 14, 2016

@erikdubbelboer yes, although likely in a future PR (the expression stuff this is based on was never documented).

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM after rebase to address conflicts from #3551.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this NumericColumnSelector reads everything as floats? Should there be a check for long type as well?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment, should this check for long type and call getLongMetric() in that case?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar suggestion to above could work here too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, that we should be supporting long in all the places

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on the choice for this ordering?

Should infinite > numeric? That seems more natural to me

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is Long.compare / Double.compare, this code only gets used if you say "ordering" : "numericFirst". So I think that's okay.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this could probably do something like,

Object val = row.getRaw(columnName);
return val instanceof Number ? val : row.get().getFloatMetric(columnName);

Most of the time, val should be an instanceof number. The getFloatMetric should only trigger if it's a string we're trying to cast to a number.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jon-wei @gianm Yep, you guys can find similar code in IncrementalIndex, too. Actually it's fixed in #2836 (of our branch) but I'll address that in here.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Oct 17, 2016

Still 👍 on this, although consider https://github.com/druid-io/druid/pull/2783/files#r83581162.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we can't keep these private anymore?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to access fields when traversing the expression.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static here is probably not needed as enums are static classes anyways

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's copy from ArithmeticPostAggregator$Ordering. Would it be better to separate this into independent class?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ha, I knew I saw this somewhere before :)

Yeah, I think we could have a set of NumberComparators utilities sort of like we have StringComparators. Separate PR would be fine by me though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two are repeated everywhere, is it appropriate to change method signature to take arguments (Expression expr, List requiredBindings) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not necessary.

@navis navis force-pushed the expression-paramed-aggregator branch from a450350 to 60b2ea7 Compare October 18, 2016 12:30
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Oct 18, 2016

👍 to changes, thx @navis.

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Oct 18, 2016

new changes LGTM, 👍

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Oct 19, 2016

thx @jon-wei for taking a look. @himanshug, any further comments?

@himanshug himanshug merged commit 8b7ff44 into apache:master Oct 19, 2016
fundead pushed a commit to fundead/druid that referenced this pull request Dec 7, 2016
* Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was apache#2820)

* Addressed comments

* addressed comments
dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017
* Supports expression-paramed aggregator (squashed and rebased on master) also includes math post aggregator (was apache#2820)

* Addressed comments

* addressed comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants