Skip to content

Conversation

@seawinde
Copy link
Contributor

Proposed changes

Root Cause Analysis:
Currently, the statistics reported by BE (Backend) nodes have higher priority than those from ANALYZE statements. During the first INSERT INTO operation, the system waits for row count reports from all tablets before updating the table statistics.
Subsequent INSERT INTO operations cannot obtain the status of all tablets, so the system continues to use the statistical information from the first INSERT INTO operation. This leads to a lower estimated cost for the original table's query plan, resulting in the selection of the original table's query plan instead of the materialized view.

Conclusion:
The test case should be modified to include a larger dataset in the first INSERT INTO operation, which will increase the likelihood of utilizing the materialized view. This is because the cost estimation will better reflect the actual data distribution and size, leading to more accurate plan selection.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@seawinde
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 31, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow morrySnow merged commit 55fde45 into apache:master Nov 1, 2024
github-actions bot pushed a commit that referenced this pull request Nov 1, 2024
…using sync mv (#43010)

Root Cause Analysis:
Currently, the statistics reported by BE (Backend) nodes have higher
priority than those from ANALYZE statements. During the first INSERT
INTO operation, the system waits for row count reports from all tablets
before updating the table statistics.
Subsequent INSERT INTO operations cannot obtain the status of all
tablets, so the system continues to use the statistical information from
the first INSERT INTO operation. This leads to a lower estimated cost
for the original table's query plan, resulting in the selection of the
original table's query plan instead of the materialized view.

Conclusion:
The test case should be modified to include a larger dataset in the
first INSERT INTO operation, which will increase the likelihood of
utilizing the materialized view. This is because the cost estimation
will better reflect the actual data distribution and size, leading to
more accurate plan selection.
dataroaring pushed a commit that referenced this pull request Nov 14, 2024
…o make sure using sync mv (#43055)

PR Body: ## Proposed changes

**Root Cause Analysis:**
Currently, the statistics reported by BE (Backend) nodes have higher
priority than those from ANALYZE statements. During the first INSERT
INTO operation, the system waits for row count reports from all tablets
before updating the table statistics.
Subsequent INSERT INTO operations cannot obtain the status of all
tablets, so the system continues to use the statistical information from
the first INSERT INTO operation. This leads to a lower estimated cost
for the original table's query plan, resulting in the selection of the
original table's query plan instead of the materialized view.

**Conclusion:**
The test case should be modified to include a larger dataset in the
first INSERT INTO operation, which will increase the likelihood of
utilizing the materialized view. This is because the cost estimation
will better reflect the actual data distribution and size, leading to
more accurate plan selection.

 
 Cherry-picked from #43010

Co-authored-by: seawinde <149132972+seawinde@users.noreply.github.com>
seawinde added a commit to seawinde/doris that referenced this pull request Dec 6, 2024
…using sync mv (apache#43010)

Root Cause Analysis:
Currently, the statistics reported by BE (Backend) nodes have higher
priority than those from ANALYZE statements. During the first INSERT
INTO operation, the system waits for row count reports from all tablets
before updating the table statistics.
Subsequent INSERT INTO operations cannot obtain the status of all
tablets, so the system continues to use the statistical information from
the first INSERT INTO operation. This leads to a lower estimated cost
for the original table's query plan, resulting in the selection of the
original table's query plan instead of the materialized view.

Conclusion:
The test case should be modified to include a larger dataset in the
first INSERT INTO operation, which will increase the likelihood of
utilizing the materialized view. This is because the cost estimation
will better reflect the actual data distribution and size, leading to
more accurate plan selection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants