*: refactor cost model formulas and constants#10581
Conversation
|
/rebuild |
Codecov Report
@@ Coverage Diff @@
## master #10581 +/- ##
================================================
- Coverage 81.4101% 81.2307% -0.1795%
================================================
Files 426 426
Lines 92513 92028 -485
================================================
- Hits 75315 74755 -560
- Misses 11826 11904 +78
+ Partials 5372 5369 -3 |
|
/bench |
5c4dfa9 to
b97ea8e
Compare
|
/rebuild |
|
/run-all-tests |
|
/run-common-test tidb-test=pr/840 |
1 similar comment
|
/run-common-test tidb-test=pr/840 |
|
/run-all-tests tidb-test=pr/840 |
|
/run-all-tests tidb-test=pr/840 |
1 similar comment
|
/run-all-tests tidb-test=pr/840 |
f949384 to
0617428
Compare
There was a problem hiding this comment.
we can calculate (colHist.TotColSize == 0 && (colHist.NullCount != coll.Count)) once outside the for loop.
There was a problem hiding this comment.
We need to get a valid colHist to make this computation check, if we move this check outside the for loop, the code is pretty ugly.
There was a problem hiding this comment.
how about replacing 1.0 with ts.stats.RowCount? That will be much clearer.
There was a problem hiding this comment.
maybe rCount is incorrect when we can use index scan on the inner side table, in which condition the scan range is decided by the correlated outer side join key.
There was a problem hiding this comment.
But we cannot know the selectivity of the outer key until execution.
There was a problem hiding this comment.
should we consider avg row size for each inner row?
There was a problem hiding this comment.
The row in memory would have different size compared with its representation in disk and network. Currently, we are using a very small default memoryFactor in order to choose the fastest plan which makes full utilization of resources. To make cost model friendly for memory management, we need to consider row size here indeed. We can leave this to another separate PR later?
|
/rebuild |
|
/run-all-tests |
|
@eurekaka merge failed. |
|
/run-all-tests tidb-test=pr/840 |
What problem does this PR solve?
Our current cost model is too naive to pick out the physical plans we prefer in some scenarios, for example:
Besides, cost computings for different operators are not uniform now: some operators consider memory cost, others do not; some operators consider operator parallelism, others do not;
What is changed and how it works?
This PR tries to
Check List
Tests
Code changes
Side effects
Related changes