Skip to content

Expressions: Optimization for string expressions on the __time column.#5109

Closed
gianm wants to merge 2 commits intoapache:masterfrom
gianm:expr-long-string-cache
Closed

Expressions: Optimization for string expressions on the __time column.#5109
gianm wants to merge 2 commits intoapache:masterfrom
gianm:expr-long-string-cache

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Nov 21, 2017

WIP: Still need to run benchmarks

@drcrallen
Copy link
Copy Markdown
Contributor

@gianm can you color this with a couple of use cases that highlight the need for optimization?

);

final List<?> results = Sequences.toList(
Sequences.map(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write cursors.map(...)

null
);

final List<?> results = Sequences.toList(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is always called with either new ArrayList<>() or Lists.newArrayList() as the second argument. The second parameter should be removed. Also, I suggest to add a default method Sequence.toList().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However I did it myself in #5175.

}

@Override
public void inspectRuntimeShape(final RuntimeShapeInspector inspector)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I try to put this method the last or one of the last, because it's not the business logic.

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Dec 18, 2017

@drcrallen,

@gianm can you color this with a couple of use cases that highlight the need for optimization?

It's an expression oriented equivalent of the optimization that SingleScanTimeDimSelector does for extractionFns on the __time column. It helps because it takes advantage of the fact that the time column tends to have a lot of runs, and with the optimization, we only compute the function once per distinct value rather than computing it repeatedly.

@leventov
Copy link
Copy Markdown
Member

I think it should be precomputed for column and stored along with the column, if such optimization makes sense for a column. Along with min, max, average for columns, and other types of O(1) information that could be useful during querying. It should be accounted here: https://github.com/druid-io/druid/projects/1

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Aug 13, 2018

@gianm we should finish this off

@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Nov 11, 2018

It turns out this patch is not necessary, since #5048 accomplished the same goal. String expressions on the __time column work by first building a base primitive selector, and #5048 optimizes those. That optimization ends up being inherited by the dimension selector.

#6599 extends the optimization to apply to Long columns that are not __time.

@gianm gianm closed this Nov 11, 2018
@gianm gianm deleted the expr-long-string-cache branch November 11, 2018 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants