[test](mv) Insert into more data when first insert into to make sure using sync mv #43010

seawinde · 2024-10-31T09:26:47Z

Proposed changes

Root Cause Analysis:
Currently, the statistics reported by BE (Backend) nodes have higher priority than those from ANALYZE statements. During the first INSERT INTO operation, the system waits for row count reports from all tablets before updating the table statistics.
Subsequent INSERT INTO operations cannot obtain the status of all tablets, so the system continues to use the statistical information from the first INSERT INTO operation. This leads to a lower estimated cost for the original table's query plan, resulting in the selection of the original table's query plan instead of the materialized view.

Conclusion:
The test case should be modified to include a larger dataset in the first INSERT INTO operation, which will increase the likelihood of utilizing the materialized view. This is because the cost estimation will better reflect the actual data distribution and size, leading to more accurate plan selection.

…using sync mv

doris-robot · 2024-10-31T09:26:52Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

seawinde · 2024-10-31T09:28:21Z

run buildall

github-actions · 2024-10-31T09:30:23Z

PR approved by at least one committer and no changes requested.

github-actions · 2024-10-31T09:30:25Z

PR approved by anyone and no changes requested.

…using sync mv (#43010) Root Cause Analysis: Currently, the statistics reported by BE (Backend) nodes have higher priority than those from ANALYZE statements. During the first INSERT INTO operation, the system waits for row count reports from all tablets before updating the table statistics. Subsequent INSERT INTO operations cannot obtain the status of all tablets, so the system continues to use the statistical information from the first INSERT INTO operation. This leads to a lower estimated cost for the original table's query plan, resulting in the selection of the original table's query plan instead of the materialized view. Conclusion: The test case should be modified to include a larger dataset in the first INSERT INTO operation, which will increase the likelihood of utilizing the materialized view. This is because the cost estimation will better reflect the actual data distribution and size, leading to more accurate plan selection.

…o make sure using sync mv (#43055) PR Body: ## Proposed changes **Root Cause Analysis:** Currently, the statistics reported by BE (Backend) nodes have higher priority than those from ANALYZE statements. During the first INSERT INTO operation, the system waits for row count reports from all tablets before updating the table statistics. Subsequent INSERT INTO operations cannot obtain the status of all tablets, so the system continues to use the statistical information from the first INSERT INTO operation. This leads to a lower estimated cost for the original table's query plan, resulting in the selection of the original table's query plan instead of the materialized view. **Conclusion:** The test case should be modified to include a larger dataset in the first INSERT INTO operation, which will increase the likelihood of utilizing the materialized view. This is because the cost estimation will better reflect the actual data distribution and size, leading to more accurate plan selection. Cherry-picked from #43010 Co-authored-by: seawinde <149132972+seawinde@users.noreply.github.com>

…using sync mv (apache#43010) Root Cause Analysis: Currently, the statistics reported by BE (Backend) nodes have higher priority than those from ANALYZE statements. During the first INSERT INTO operation, the system waits for row count reports from all tablets before updating the table statistics. Subsequent INSERT INTO operations cannot obtain the status of all tablets, so the system continues to use the statistical information from the first INSERT INTO operation. This leads to a lower estimated cost for the original table's query plan, resulting in the selection of the original table's query plan instead of the materialized view. Conclusion: The test case should be modified to include a larger dataset in the first INSERT INTO operation, which will increase the likelihood of utilizing the materialized view. This is because the cost estimation will better reflect the actual data distribution and size, leading to more accurate plan selection.

[test](mv) Insert into more data when first insert into to make sure …

e9f824e

…using sync mv

morrySnow added dev/2.1.x dev/3.0.x labels Oct 31, 2024

morrySnow approved these changes Oct 31, 2024

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 31, 2024

github-actions bot added the reviewed label Oct 31, 2024

englefly approved these changes Oct 31, 2024

View reviewed changes

morrySnow merged commit 55fde45 into apache:master Nov 1, 2024

github-actions bot mentioned this pull request Nov 1, 2024

branch-3.0: [test](mv) Insert into more data when first insert into to make sure using sync mv #43055

Merged

morrySnow added dev/3.0.3-merged and removed dev/3.0.x labels Nov 19, 2024

seawinde mentioned this pull request Dec 6, 2024

Pick some pr to 21 #43010 #43030 #43785 #44779 #44786 #44857 #45129

Merged

16 tasks

yiguolei pushed a commit that referenced this pull request Dec 9, 2024

Pick some pr to 21 #43010 #43030 #43785 #44779 #44786 #44857 (#45129)

1662e47

yiguolei added dev/2.1.8-merged and removed dev/2.1.x labels Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test](mv) Insert into more data when first insert into to make sure using sync mv #43010

[test](mv) Insert into more data when first insert into to make sure using sync mv #43010

Uh oh!

seawinde commented Oct 31, 2024

Uh oh!

doris-robot commented Oct 31, 2024

Uh oh!

seawinde commented Oct 31, 2024

Uh oh!

github-actions bot commented Oct 31, 2024

Uh oh!

github-actions bot commented Oct 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[test](mv) Insert into more data when first insert into to make sure using sync mv #43010

[test](mv) Insert into more data when first insert into to make sure using sync mv #43010

Uh oh!

Conversation

seawinde commented Oct 31, 2024

Proposed changes

Uh oh!

doris-robot commented Oct 31, 2024

Uh oh!

seawinde commented Oct 31, 2024

Uh oh!

github-actions bot commented Oct 31, 2024

Uh oh!

github-actions bot commented Oct 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants