Skip to content

Remove makeMathExpressionSelector from ColumnSelectorFactory.#3815

Merged
fjy merged 4 commits intoapache:masterfrom
gianm:expression-csf-split
Jan 6, 2017
Merged

Remove makeMathExpressionSelector from ColumnSelectorFactory.#3815
fjy merged 4 commits intoapache:masterfrom
gianm:expression-csf-split

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Jan 3, 2017

Half of the second part of https://groups.google.com/forum/#!msg/druid-development/_Sd78s7yU5U/RJ8qLOJ5DwAJ, skipping the aggregator api changes.

@gianm gianm added this to the 0.10.0 milestone Jan 3, 2017
// Unknown ValueType. Try making an Object selector and see if that gives us anything useful.
final ObjectColumnSelector selector = columnSelectorFactory.makeObjectColumnSelector(columnName);
final Class clazz = selector == null ? null : selector.classOfObject();
if (selector == null || (clazz != Object.class && Number.class.isAssignableFrom(clazz))) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be !Number.class.isAssignableFrom(clazz)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right, will fix.

};
}

@Override
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please label this method and the interface method @Nullable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

final Supplier<Number> supplier;

if (nativeType == ValueType.FLOAT) {
final FloatColumnSelector selector = columnSelectorFactory.makeFloatColumnSelector(columnName);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think FloatColumnSelector could just extend Supplier<Float> and then "unsafe" cast is done here instead of wrapping?

(This also requires renaming of float get() to avoid name clash)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could but I'm not sure that change is warranted.

Is your concern performance or is it the amount of code in this method? If it's the latter then how about adding a method like FloatColumnSelector.asSupplier(floatColumnSelector)?

If it's performance, I guess I'm not super worried about performance in the Expressions related code since it's already doing a lot of sub-optimal stuff anyway (like interpreting expressions on top of boxed numbers). Changing that might involve no longer using Suppliers anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was about performance, but given that proper optimization might involve moving away from Supplier<Number>, the change that I suggested will become pointless, so let's not make it.

}
};
} else if (nativeType == ValueType.LONG) {
final LongColumnSelector selector = columnSelectorFactory.makeLongColumnSelector(columnName);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as about FloatColumnSelector a few lines above

return new ExpressionObjectSelector(createBindings(columnSelectorFactory, expression), expression);
}

private static Expr.ObjectBinding createBindings(ColumnSelectorFactory columnSelectorFactory, Expr expression)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider splitting this 50-line method into some smaller methods

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't feel to me like there's much going on in this method; there are a lot of lines but most of them aren't doing much. I guess I could break out the nativeType == null branch into a method like createNumberSupplierFromObjectColumnSelector but I'm not sure that would actually make it easier to understand.

Do you think it would?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to understand this method if all three blocks (float, long, object) are extracted as static methods

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I'll trust your opinion on readability more than mine since you didn't write the code.

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Jan 4, 2017

@leventov, I pushed some updates you suggested, please let me know what you think about the others. Thanks for the review.

// We know there are no numbers here. Use a null supplier.
supplier = null;
} else {
if (selector != null && (clazz == Object.class || Number.class.isAssignableFrom(clazz))) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test for this condition? Also clazz.isAssignableFrom(Number.class) feels clearer than clazz == Object.class

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll add a test.

clazz.isAssignableFrom(Number.class) is not exactly what I'm trying to go for, though. This code path might also be parsing Strings (see tryParse). I guess this could be clazz.isAssignableFrom(Number.class) || clazz.isAssignableFrom(String.class).

supplier = null;
}
} else {
// Unhandleable ValueType (possibly STRING or COMPLEX).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this attempt to handle String/Complex by going through the nativeType == null tryParse block above (e.g., for columns that contain numbers as strings)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, and we could actually also do an impl based on DimensionSelector, but I avoided that on purpose because that would lead to expressions and field accesses behaving differently for aggregators: longSum over fieldName "x" and expression "x" would behave differently if "x" were a string column. There's a test case in SchemaEvolutionTest that verifies this and would fail if the behavior were different.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could go in the direction of having aggregators work over string columns though?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, okay, let's leave that area in this PR as-is for consistency then

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Jan 5, 2017

@leventov, I broke up that big method and added tests.

Copy link
Copy Markdown
Member

@leventov leventov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jon-wei
Copy link
Copy Markdown
Contributor

jon-wei commented Jan 6, 2017

👍

1 similar comment
@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jan 6, 2017

👍

@fjy fjy merged commit 3c63cff into apache:master Jan 6, 2017
@gianm gianm deleted the expression-csf-split branch January 8, 2017 00:45
dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017
…#3815)

* Remove makeMathExpressionSelector from ColumnSelectorFactory.

* Add @nullable annotations in places, fix Number.class check.

* Break up createBindings, add tests.

* Add null check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants