-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Opt](Iceberg) Only initialize one split if the statement can push down count #34775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Opt](Iceberg) Only initialize one split if the statement can push down count #34775
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
| // iceberg use integer to store date, | ||
| // we need transform it to string | ||
| value = DateTimeUtil.daysToIsoDate((Integer) obj); | ||
| for (CombinedScanTask taskGrp : combinedScanTasks) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find a better way to end the loop in forEach, so i use for to replace forEach. :(
| partitionPathSet.add(structLike.toString()); | ||
| // End loop early as one split is enough if the statement can push down count | ||
| if (canPushCount) { | ||
| break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what i want to do. End the entire loop early to avoid a lot of useless spilts if the statement can push down count.
|
run buildall |
|
@wuwenchi Could you give some suggestion about this change? Thanks. |
TPC-H: Total hot run time: 39914 ms |
TPC-DS: Total hot run time: 187944 ms |
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to do this optimization, why not just create a dummy IcebergSplit?
So we even don't need to call TableScanUtil.planTasks
@morningman Thanks for your suggestion! You are right, create a dummy IcebergSplit is a better approach than this PR. But i found that BE need a real iceberg spilt to do some code logic, and we need do some odd check in BE side to let the BE accept the dummy IcebergSplit. Acutually, i think the I just submitted a new PR #34928, Please take a look if you have time. |
Proposed changes
#22923 did a good optimization for iceberg count. I think we can end the
get splits loopearly as one split is enough if the statement can push down count. This can reduce the query time if iceberg table has many splits.Issue Number: close #xxx
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...