Skip to content

*: refactor cost model formulas and constants#10581

Merged
eurekaka merged 3 commits into
pingcap:masterfrom
eurekaka:cost_model
Aug 7, 2019
Merged

*: refactor cost model formulas and constants#10581
eurekaka merged 3 commits into
pingcap:masterfrom
eurekaka:cost_model

Conversation

@eurekaka
Copy link
Copy Markdown
Contributor

@eurekaka eurekaka commented May 23, 2019

What problem does this PR solve?

Our current cost model is too naive to pick out the physical plans we prefer in some scenarios, for example:

  • cost such as sorting in index lookup operator, or inner cost of index join operator, is not reflected in cost computing at all;
  • some cost computings are wrong or not that accurate because we are using wrong input row count estimation (e.g, cost computing of TopN operator);

Besides, cost computings for different operators are not uniform now: some operators consider memory cost, others do not; some operators consider operator parallelism, others do not;

What is changed and how it works?

This PR tries to

    1. refine cost model to catch up with the current executor implementations
    1. and uniform the dimensions we consider in cost computing for all operators, i.e, CPU cost, memory cost, network cost, scan cost, and operator parallelism.

Check List

Tests

  • Unit test: some UT results are updated
  • Integration test: some integration tests are updated

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change

Side effects

  • Possible performance regression

Related changes

  • Need to cherry-pick to the release branch: we may need this in release-3.0
  • Need to be included in the release note

@eurekaka eurekaka added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner labels May 23, 2019
@zhouqiang-cl
Copy link
Copy Markdown
Contributor

/rebuild

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2019

Codecov Report

Merging #10581 into master will decrease coverage by 0.1794%.
The diff coverage is 96.2643%.

@@               Coverage Diff                @@
##             master     #10581        +/-   ##
================================================
- Coverage   81.4101%   81.2307%   -0.1795%     
================================================
  Files           426        426                
  Lines         92513      92028       -485     
================================================
- Hits          75315      74755       -560     
- Misses        11826      11904        +78     
+ Partials       5372       5369         -3

@zhouqiang-cl
Copy link
Copy Markdown
Contributor

/bench

Comment thread cmd/explaintest/r/tpch.result Outdated
Comment thread executor/builder.go Outdated
@eurekaka eurekaka force-pushed the cost_model branch 2 times, most recently from 5c4dfa9 to b97ea8e Compare June 17, 2019 09:56
@eurekaka
Copy link
Copy Markdown
Contributor Author

/rebuild

@eurekaka
Copy link
Copy Markdown
Contributor Author

/run-all-tests

@eurekaka
Copy link
Copy Markdown
Contributor Author

/run-common-test tidb-test=pr/840

1 similar comment
@eurekaka
Copy link
Copy Markdown
Contributor Author

/run-common-test tidb-test=pr/840

@eurekaka
Copy link
Copy Markdown
Contributor Author

/run-all-tests tidb-test=pr/840

@eurekaka eurekaka marked this pull request as ready for review June 24, 2019 05:52
@eurekaka
Copy link
Copy Markdown
Contributor Author

/run-all-tests tidb-test=pr/840

1 similar comment
@eurekaka
Copy link
Copy Markdown
Contributor Author

/run-all-tests tidb-test=pr/840

@eurekaka eurekaka requested review from alivxxx, winoros and zz-jason June 24, 2019 08:20
Comment thread planner/core/task.go Outdated
Comment thread planner/core/task.go Outdated
Comment thread statistics/table.go Outdated
@eurekaka eurekaka requested review from alivxxx and winoros July 2, 2019 10:41
@eurekaka eurekaka changed the title planner: refactor cost model formulas and constants *: refactor cost model formulas and constants Jul 18, 2019
Comment thread planner/core/task.go Outdated
Comment thread planner/core/task.go Outdated
Copy link
Copy Markdown
Contributor

@lzmhhh123 lzmhhh123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lzmhhh123 lzmhhh123 added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 26, 2019
Comment thread planner/core/task.go Outdated
Comment thread executor/index_lookup_join_test.go Outdated
Comment thread planner/core/cbo_test.go Outdated
Comment thread planner/core/exhaust_physical_plans.go Outdated
Comment thread planner/core/exhaust_physical_plans.go Outdated
@eurekaka eurekaka force-pushed the cost_model branch 2 times, most recently from f949384 to 0617428 Compare August 1, 2019 12:59
@eurekaka eurekaka requested review from alivxxx and zz-jason August 1, 2019 13:00
@alivxxx alivxxx removed their request for review August 2, 2019 06:25
@qw4990 qw4990 removed their request for review August 5, 2019 07:38
Comment thread statistics/table.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can calculate (colHist.TotColSize == 0 && (colHist.NullCount != coll.Count)) once outside the for loop.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to get a valid colHist to make this computation check, if we move this check outside the for loop, the code is pretty ugly.

Comment thread planner/core/exhaust_physical_plans.go Outdated
Comment thread planner/core/exhaust_physical_plans.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about replacing 1.0 with ts.stats.RowCount? That will be much clearer.

Comment thread planner/core/task.go Outdated
Comment thread planner/core/task.go Outdated
Comment thread planner/core/task.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rCount is incorrect when we can use index scan on the inner side table, in which condition the scan range is decided by the correlated outer side join key.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we cannot know the selectivity of the outer key until execution.

Comment thread planner/core/task.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider avg row size for each inner row?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The row in memory would have different size compared with its representation in disk and network. Currently, we are using a very small default memoryFactor in order to choose the fastest plan which makes full utilization of resources. To make cost model friendly for memory management, we need to consider row size here indeed. We can leave this to another separate PR later?

Comment thread planner/core/task.go Outdated
Comment thread planner/core/task.go Outdated
@eurekaka eurekaka requested a review from zz-jason August 7, 2019 05:59
@eurekaka
Copy link
Copy Markdown
Contributor Author

eurekaka commented Aug 7, 2019

/rebuild

Copy link
Copy Markdown
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason removed request for foreyes and winoros August 7, 2019 08:36
@zz-jason zz-jason added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 7, 2019
@sre-bot
Copy link
Copy Markdown
Contributor

sre-bot commented Aug 7, 2019

/run-all-tests

@sre-bot
Copy link
Copy Markdown
Contributor

sre-bot commented Aug 7, 2019

@eurekaka merge failed.

@eurekaka
Copy link
Copy Markdown
Contributor Author

eurekaka commented Aug 7, 2019

/run-all-tests tidb-test=pr/840

@eurekaka eurekaka merged commit fe03864 into pingcap:master Aug 7, 2019
@eurekaka eurekaka deleted the cost_model branch August 7, 2019 09:57
lzmhhh123 pushed a commit to lzmhhh123/tidb that referenced this pull request Jan 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sig/planner SIG: Planner status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants