Skip to content

Conversation

@Dandandan
Copy link
Contributor

Which issue does this PR close?

Closes #799

Rationale for this change

Speeding up inlist with one or two expressions by converting to a normal boolean expression.

What changes are included in this PR?

Are there any user-facing changes?

@Dandandan Dandandan changed the title Simplify inlist 799 Add optimizer rule to replace inlist with or chain for small list Aug 1, 2021
@Dandandan Dandandan changed the title Add optimizer rule to replace inlist with or chain for small list Add optimizer rule to replace inlist with or chain for small expression list Aug 1, 2021
@Dandandan Dandandan marked this pull request as draft August 1, 2021 16:50
@Dandandan
Copy link
Contributor Author

Seems it is not faster (yet). Will do some more research

@jhorstmann
Copy link
Contributor

Cool, this is interesting to me since in our query engine we started from the opposite approach of initially rewriting all IN expressions into comparison and OR and are now introducing specialized kernel for IN. Would be very interested in finding out where the threshold for this optimization is. The benefit is probably bigger for dictionary encoded data and numbers since the string comparison itself will involve some branches.

@alamb
Copy link
Contributor

alamb commented Aug 3, 2021

Possibly also related #813 as a different performance approach

@Dandandan
Copy link
Contributor Author

I might tune this later at a later moment to be for empty/single items instead for which it really should be an improvement, and do some more profiling.
It could also be worse right now because of a slower kernel (how fast is the or ATM?)

@alamb
Copy link
Contributor

alamb commented Oct 26, 2021

Marking PRs that haven't had activity in over a month as 'stale-pr' to help me filter the list. Please remove the label or let me know if "stale" is not the correct designation

@alamb alamb added the stale-pr label Oct 26, 2021
@alamb
Copy link
Contributor

alamb commented Nov 2, 2021

Closing a seemingly stale PR -- please reopen if that was a mistake.

@alamb alamb closed this Nov 2, 2021
unkloud pushed a commit to unkloud/datafusion that referenced this pull request Mar 23, 2025
* Enable shuffle in benchmarks

* format

* Revert remove SPARK_GENERATE_BENCHMARK_FILES=1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add optimizer rule to replace inlist with or chain for small list

3 participants