[fix](planner) query should be cancelled if limit reached #44338

morningman · 2024-11-20T08:41:36Z

What problem does this PR solve?

Problem Summary:
When there is a limit cluse in SQL, if FE has obtained data with more than the limit number of rows,
it should send a cancel command to BE to cancel the query to prevent BE from reading more data.
However, this function has problems in the current code and does not work.
Especially in external table query, this may result in lots of unnecessary network io read.

isBlockQuery

In the old optimizer, if a query statement contains a sort or agg node,
isBlockQuery will be marked as true, otherwise it will be false.
In the new optimizer, this value is always true.

Regardless of the old or new optimizer, this logic is wrong.
But only when isBlockQuery = false will the reach limit logic be triggered.
Calling problem of reach limit logic

The reach limit logic judgment will only be performed when eos = true in the rowBatch returned by BE.
This is wrong.
Because for limit N queries, each BE's own limit is N. But for FE, as long as the total number of rows
returned by all BEs exceeds N, the reach limit logic can be triggered.
So it should not be processed only when eos = true.

The PR mainly changes:

Remove isBlockQuery

isBlockQuery is only used in the reach limit logic. And it is not needed. Remove it completely.
Modify the judgment position of reach limit.

When the number of rows obtained by FE is greater than the limit,
it will check the reach limit logic.
fix wrong limitRows in QueryProcessor

the limitRows should be got from the first fragment, not last.
In scanner scheduler on BE side, if scanner has limit, ignore the scan bytes threshold per round.

Release note

fix query should be cancelled if limit reached

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test
  test single and multi-be nodes with limit sql.
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

doris-robot · 2024-11-20T08:41:41Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

morningman · 2024-11-20T10:52:25Z

run buildall

github-actions · 2024-11-20T11:20:12Z

PR approved by at least one committer and no changes requested.

github-actions · 2024-11-20T11:20:14Z

PR approved by anyone and no changes requested.

morningman · 2024-11-20T14:05:35Z

run buildall

morningman · 2024-11-21T02:51:55Z

run buildall

morningman · 2024-11-21T06:15:06Z

run buildall

github-actions · 2024-11-22T05:57:06Z

PR approved by at least one committer and no changes requested.

kaka11chen

LGTM

morningman · 2024-12-05T05:34:44Z

run buildall

doris-robot · 2024-12-05T06:21:46Z

TeamCity be ut coverage result:
Function Coverage: 38.46% (10006/26014)
Line Coverage: 29.52% (83909/284254)
Region Coverage: 28.62% (43113/150662)
Branch Coverage: 25.21% (21914/86918)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d12d3f9c1de7196760196e04aa2ec57f9bc16cfd_d12d3f9c1de7196760196e04aa2ec57f9bc16cfd/report/index.html

morningman · 2024-12-05T08:56:11Z

run buildall

HappenLee · 2024-12-05T10:56:26Z

be/src/vec/exec/scan/scanner_scheduler.cpp

                    ctx->inc_block_usage(free_block->allocated_bytes());
                    scan_task->cached_blocks.emplace_back(std::move(free_block), free_block_bytes);
                }
+                if (limit < ctx->batch_size()) {


if limit == -1? not limit query will be true. better the check in while loop ?

2 3 1 4 bytes limit

morningman · 2024-12-05T17:40:33Z

run buildall

doris-robot · 2024-12-05T18:39:55Z

TeamCity be ut coverage result:
Function Coverage: 38.48% (10007/26004)
Line Coverage: 29.52% (83926/284308)
Region Coverage: 28.62% (43134/150700)
Branch Coverage: 25.21% (21926/86958)
Coverage Report: http://coverage.selectdb-in.cc/coverage/670d90eb0b356074f8f23293671f7bee4a30547b_670d90eb0b356074f8f23293671f7bee4a30547b/report/index.html

HappenLee

LGTM

github-actions · 2024-12-06T03:30:18Z

PR approved by at least one committer and no changes requested.

kaka11chen

LGTM

Problem Summary: When there is a `limit` cluse in SQL, if FE has obtained data with more than the `limit` number of rows, it should send a cancel command to BE to cancel the query to prevent BE from reading more data. However, this function has problems in the current code and does not work. Especially in external table query, this may result in lots of unnecessary network io read. 1. `isBlockQuery` In the old optimizer, if a query statement contains a `sort` or `agg` node, `isBlockQuery` will be marked as true, otherwise it will be false. In the new optimizer, this value is always true. Regardless of the old or new optimizer, this logic is wrong. But only when `isBlockQuery = false` will the reach limit logic be triggered. 2. Calling problem of reach limit logic The reach limit logic judgment will only be performed when `eos = true` in the rowBatch returned by BE. This is wrong. Because for `limit N` queries, each BE's own `limit` is N. But for FE, as long as the total number of rows returned by all BEs exceeds N, the reach limit logic can be triggered. So it should not be processed only when `eos = true`. The PR mainly changes: 1. Remove `isBlockQuery` `isBlockQuery` is only used in the reach limit logic. And it is not needed. Remove it completely. 2. Modify the judgment position of reach limit. When the number of rows obtained by FE is greater than the limit, it will check the reach limit logic. 3. fix wrong `limitRows` in `QueryProcessor` the limitRows should be got from the first fragment, not last. 4. In scanner scheduler on BE side, if scanner has limit, ignore the scan bytes threshold per round. [fix](planner) query should be cancelled if limit reached

…45223) bp #44338

…45222) cherry-pick #44338

…) (apache#45222) cherry-pick apache#44338

morningman marked this pull request as draft November 20, 2024 08:41

morningman force-pushed the reach_limit_erro branch from a1a7aee to ee3851c Compare November 20, 2024 08:42

morningman added dev/2.1.x dev/3.0.x labels Nov 20, 2024

morningman marked this pull request as ready for review November 20, 2024 10:52

924060929 previously approved these changes Nov 20, 2024

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 20, 2024

github-actions bot added the reviewed label Nov 20, 2024

morningman dismissed 924060929’s stale review via 241dae2 November 20, 2024 14:05

github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 20, 2024

morningman force-pushed the reach_limit_erro branch from 241dae2 to b6e3041 Compare November 21, 2024 02:34

924060929 previously approved these changes Nov 22, 2024

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 22, 2024

kaka11chen previously approved these changes Nov 29, 2024

View reviewed changes

morningman dismissed stale reviews from kaka11chen and 924060929 via d12d3f9 December 4, 2024 06:58

morningman force-pushed the reach_limit_erro branch from d886471 to d12d3f9 Compare December 4, 2024 06:58

github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 4, 2024

morningman force-pushed the reach_limit_erro branch from 11438bc to a7bca98 Compare December 5, 2024 08:56

HappenLee reviewed Dec 5, 2024

View reviewed changes

morningman added 3 commits December 6, 2024 01:39

[fix](planner) query should be cancelled if limit reached

c7fc0b7

2 3 1 4 bytes limit

batch size

c3c6c5c

fix

670d90e

morningman dismissed stale reviews from 924060929 and morrySnow via 670d90e December 5, 2024 17:40

morningman force-pushed the reach_limit_erro branch from a7bca98 to 670d90e Compare December 5, 2024 17:40

github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 5, 2024

HappenLee approved these changes Dec 6, 2024

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 6, 2024

kaka11chen approved these changes Dec 10, 2024

View reviewed changes

morningman merged commit 5aee0cc into apache:master Dec 10, 2024
17 of 19 checks passed

github-actions bot added dev/3.0.x-conflict dev/2.1.x-conflict labels Dec 10, 2024

morningman mentioned this pull request Dec 10, 2024

[fix](planner) query should be cancelled if limit reached (#44338) #45222

Merged

morningman mentioned this pull request Dec 10, 2024

[fix](planner) query should be cancelled if limit reached (#44338) #45223

Merged

morningman added a commit that referenced this pull request Dec 10, 2024

[fix](planner) query should be cancelled if limit reached (#44338) (#…

49d1671

…45223) bp #44338

morningman added dev/3.0.4-merged and removed dev/3.0.x dev/3.0.x-conflict labels Dec 10, 2024

morningman added a commit that referenced this pull request Dec 10, 2024

[fix](planner) query should be cancelled if limit reached (#44338) (#…

e29d125

…45222) cherry-pick #44338

morningman added dev/2.1.8-merged and removed dev/2.1.x dev/2.1.x-conflict labels Dec 10, 2024

GoGoWen pushed a commit to GoGoWen/incubator-doris that referenced this pull request Sep 26, 2025

[fix](planner) query should be cancelled if limit reached (apache#44338…

24268af

…) (apache#45222) cherry-pick apache#44338

[fix](planner) query should be cancelled if limit reached #44338

[fix](planner) query should be cancelled if limit reached #44338

Uh oh!

Conversation

morningman commented Nov 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

doris-robot commented Nov 20, 2024

Uh oh!

morningman commented Nov 20, 2024

Uh oh!

github-actions bot commented Nov 20, 2024

Uh oh!

github-actions bot commented Nov 20, 2024

Uh oh!

morningman commented Nov 20, 2024

Uh oh!

morningman commented Nov 21, 2024

Uh oh!

morningman commented Nov 21, 2024

Uh oh!

github-actions bot commented Nov 22, 2024

Uh oh!

kaka11chen left a comment

Choose a reason for hiding this comment

Uh oh!

morningman commented Dec 5, 2024

Uh oh!

doris-robot commented Dec 5, 2024

Uh oh!

morningman commented Dec 5, 2024

Uh oh!

HappenLee Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

morningman commented Dec 5, 2024

Uh oh!

doris-robot commented Dec 5, 2024

Uh oh!

HappenLee left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 6, 2024

Uh oh!

kaka11chen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

morningman commented Nov 20, 2024 •

edited

Loading

HappenLee Dec 5, 2024 •

edited

Loading