Skip to content

ExpressionSelectors: Add optimized selectors.#5048

Merged
fjy merged 2 commits intoapache:masterfrom
gianm:expression-selector-optimize
Nov 14, 2017
Merged

ExpressionSelectors: Add optimized selectors.#5048
fjy merged 2 commits intoapache:masterfrom
gianm:expression-selector-optimize

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Nov 6, 2017

Adds optimized selectors for certain cases.

  • SingleLongInputCaching selector for expressions on the __time column,
    using a similar optimization to SingleScanTimeDimSelector
  • SingleStringInputDimensionSelector for expressions on string columns
    that return strings, using a similar optimization to ExtractionFn
    based DimensionSelectors.
  • SingleStringInputCaching selector for expressions on string columns
    that return primitives.

Also, in the SQL planner, prefer expressions for time operations
rather than extractionFns.

Benchmark:

patch

Benchmark                                                  (rowsPerSegment)  Mode  Cnt   Score   Error  Units
ExpressionSelectorBenchmark.strlenUsingExpressionAsLong             1000000  avgt   30  16.996 ± 0.236  ms/op
ExpressionSelectorBenchmark.strlenUsingExpressionAsString           1000000  avgt   30  11.705 ± 0.161  ms/op
ExpressionSelectorBenchmark.strlenUsingExtractionFn                 1000000  avgt   30   7.018 ± 1.145  ms/op
ExpressionSelectorBenchmark.timeFloorUsingCursor                    1000000  avgt   30  20.294 ± 0.236  ms/op
ExpressionSelectorBenchmark.timeFloorUsingExpression                1000000  avgt   30  14.346 ± 0.237  ms/op
ExpressionSelectorBenchmark.timeFloorUsingExtractionFn              1000000  avgt   30  11.728 ± 0.247  ms/op

master

Benchmark                                                  (rowsPerSegment)  Mode  Cnt    Score   Error  Units
ExpressionSelectorBenchmark.strlenUsingExpressionAsLong             1000000  avgt   30  107.439 ± 2.332  ms/op
ExpressionSelectorBenchmark.strlenUsingExpressionAsString           1000000  avgt   30  132.240 ± 2.391  ms/op
ExpressionSelectorBenchmark.strlenUsingExtractionFn                 1000000  avgt   30    7.113 ± 1.274  ms/op
ExpressionSelectorBenchmark.timeFloorUsingCursor                    1000000  avgt   30   20.291 ± 0.379  ms/op
ExpressionSelectorBenchmark.timeFloorUsingExpression                1000000  avgt   30  115.135 ± 2.363  ms/op
ExpressionSelectorBenchmark.timeFloorUsingExtractionFn              1000000  avgt   30   11.292 ± 0.121  ms/op

- SingleLongInputCaching selector for expressions on the __time column,
  using a similar optimization to SingleScanTimeDimSelector
- SingleStringInputDimensionSelector for expressions on string columns
  that return strings, using a similar optimization to ExtractionFn
  based DimensionSelectors.
- SingleStringInputCaching selector for expressions on string columns
  that return primitives.

Also, in the SQL planner, prefer expressions for time operations
rather than extractionFns.
final ExprEval exprEval = baseSelector.getObject();
return exprEval.value();
// No need for null check on getObject() since baseSelector impls will never return null.
return baseSelector.getObject().value();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add //noinspection ... comment for IntelliJ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added //noinspection ConstantConditions

}

@VisibleForTesting
@Nonnull
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to annotate the package with @EverythingIsNonnullByDefault instead

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started trying to do this but ran into a few too many things to change as a result (mostly to get rid of false suggestions of "remove null check, X cannot be null"). So I ended up leaving it as-is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to resolve them, I think

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true, but I didn't see much point in adding @EverythingIsNonnullByDefault without fixing the issues that causes with inspections, given that its whole point is to help with inspections in the first place.

@Nullable
static Supplier<Object> supplierFromObjectSelector(final BaseObjectColumnValueSelector<?> selector)
{
if (selector == null) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnSelectorFactory.makeColumnValueSelector() always returns non-null now, I think we shouldn't accept nulls here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced this with selector instanceof NilColumnValueSelector since this seems to be how to do it now.

final Supplier<Object> inputSupplier = ExpressionSelectors.supplierFromDimensionSelector(selector);
this.bindings = name -> inputSupplier.get();

if (selector.getValueCardinality() <= CACHE_SIZE) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it doesn't consider -1 which is "unknown" cardinality

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a check for this a couple of lines up… cardinality >= 0 is a precondition for using this optimized implementation. I moved the check to be closer to this block so they are more visually connected. I also added a note to the javadoc about the precondition.


public ExprEval compute(final int id)
{
ExprEval value = m.getAndMoveToFirst(id);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference: vigna/fastutil#86

@Override
public IndexedInts getRow()
{
// Treat any non-single-valued row as a row containing a single null value, to ensure consistency with
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make this comment a javadoc comment of the method and make ExpressionSelectors.supplierFromDimensionSelector a reference.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

} else {
// Can't handle non-singly-valued rows in expressions.
// Treat them as nulls until we think of something better to do.
return new SingleIndexedInt(0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be cached, SingleIndexedInt.of() and cache 0-127 :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this.

if (nullAdjustment == 0) {
return row;
} else {
return new SingleIndexedInt(row.get(0) + nullAdjustment);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add MutableSingleIndexedInt and cache the object

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this offers a measurable benefit and I'm also not sure if the return value from getRow ever needs to persist beyond the next movement of the selector. So I left this as-is.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

private final Expr expression;
private final SingleInputBindings bindings = new SingleInputBindings();

// 0 if selector has null as a value; 1 if it doesn't.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make a javadoc comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@fjy fjy merged commit 77df5e0 into apache:master Nov 14, 2017
@jon-wei jon-wei added this to the 0.12.0 milestone Jan 5, 2018
@gianm gianm deleted the expression-selector-optimize branch September 23, 2022 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants