Skip to content

planner: add aggregation hints TIDB_HASHAGG and TIDB_STREAMAGG#11364

Merged
foreyes merged 33 commits into
pingcap:masterfrom
foreyes:dev/add_agg_hints
Aug 7, 2019
Merged

planner: add aggregation hints TIDB_HASHAGG and TIDB_STREAMAGG#11364
foreyes merged 33 commits into
pingcap:masterfrom
foreyes:dev/add_agg_hints

Conversation

@foreyes
Copy link
Copy Markdown
Contributor

@foreyes foreyes commented Jul 22, 2019

What problem does this PR solve?

Add Optimizer Hints TIDB_HASHAGG and TIDB_STREAMAGG.

What is changed and how it works?

Handle the hint from parser, and enforce planner to choose the aggregation type.
Related parser PR: pingcap/parser#394

mysql> explain select count(*) from t t1, t t2 where t1.a = t2.b;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| StreamAgg_13             | 1.00     | root | funcs:count(1)                                                     |
| └─HashLeftJoin_26        | 12500.00 | root | inner join, inner:TableReader_20, equal:[eq(test.t1.a, test.t2.b)] |
|   ├─TableReader_22       | 10000.00 | root | data:TableScan_21                                                  |
|   │ └─TableScan_21       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo        |
|   └─TableReader_20       | 10000.00 | root | data:TableScan_19                                                  |
|     └─TableScan_19       | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo        |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set (0.00 sec)

mysql> explain select /*+ TIDB_HASHAGG() */ count(*) from t t1, t t2 where t1.a = t2.b;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| HashAgg_11               | 1.00     | root | funcs:count(1)                                                     |
| └─HashLeftJoin_15        | 12500.00 | root | inner join, inner:TableReader_18, equal:[eq(test.t1.a, test.t2.b)] |
|   ├─TableReader_20       | 10000.00 | root | data:TableScan_19                                                  |
|   │ └─TableScan_19       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo        |
|   └─TableReader_18       | 10000.00 | root | data:TableScan_17                                                  |
|     └─TableScan_17       | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo        |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set (0.01 sec)
mysql> explain select count(t1.a) from t t1, t t2 where t1.a = t2.a*2 group by t1.a;
+--------------------------+----------+------+---------------------------------------------------------------------------+
| id                       | count    | task | operator info                                                             |
+--------------------------+----------+------+---------------------------------------------------------------------------+
| HashAgg_13               | 8000.00  | root | group by:test.t1.a, funcs:count(test.t1.a)                                |
| └─HashLeftJoin_16        | 12500.00 | root | inner join, inner:Projection_21, equal:[eq(test.t1.a, mul(test.t2.a, 2))] |
|   ├─TableReader_20       | 10000.00 | root | data:TableScan_19                                                         |
|   │ └─TableScan_19       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo               |
|   └─Projection_21        | 10000.00 | root | test.t2.a, mul(test.t2.a, 2)                                              |
|     └─TableReader_23     | 10000.00 | root | data:TableScan_22                                                         |
|       └─TableScan_22     | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo               |
+--------------------------+----------+------+---------------------------------------------------------------------------+
7 rows in set (0.00 sec)

mysql> explain select /*+ TIDB_STREAMAGG() */ count(t1.a) from t t1, t t2 where t1.a = t2.a*2 group by t1.a;
+----------------------------+----------+------+---------------------------------------------------------------------------+
| id                         | count    | task | operator info                                                             |
+----------------------------+----------+------+---------------------------------------------------------------------------+
| StreamAgg_15               | 8000.00  | root | group by:test.t1.a, funcs:count(test.t1.a)                                |
| └─Sort_24                  | 12500.00 | root | test.t1.a:asc                                                             |
|   └─HashLeftJoin_16        | 12500.00 | root | inner join, inner:Projection_21, equal:[eq(test.t1.a, mul(test.t2.a, 2))] |
|     ├─TableReader_20       | 10000.00 | root | data:TableScan_19                                                         |
|     │ └─TableScan_19       | 10000.00 | cop  | table:t1, range:[-inf,+inf], keep order:false, stats:pseudo               |
|     └─Projection_21        | 10000.00 | root | test.t2.a, mul(test.t2.a, 2)                                              |
|       └─TableReader_23     | 10000.00 | root | data:TableScan_22                                                         |
|         └─TableScan_22     | 10000.00 | cop  | table:t2, range:[-inf,+inf], keep order:false, stats:pseudo               |
+----------------------------+----------+------+---------------------------------------------------------------------------+
8 rows in set (0.00 sec)

Check List

Tests

  • Unit test

Code changes

  • Change plan builder to handling aggregation hints.
  • Change exhaust physical plan to apply aggregation hints.

Side effects

  • Change optimizer behaviors.

Related changes

  • Add new rule in parser

@codecov
Copy link
Copy Markdown

codecov Bot commented Jul 22, 2019

Codecov Report

Merging #11364 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #11364   +/-   ##
===========================================
  Coverage   81.6243%   81.6243%           
===========================================
  Files           426        426           
  Lines         93640      93640           
===========================================
  Hits          76433      76433           
  Misses        11807      11807           
  Partials       5400       5400

@foreyes foreyes force-pushed the dev/add_agg_hints branch 2 times, most recently from ecd3e00 to ff1f239 Compare July 23, 2019 07:05
@foreyes
Copy link
Copy Markdown
Contributor Author

foreyes commented Jul 24, 2019

/run-all-tests

@foreyes foreyes requested review from alivxxx and zz-jason July 24, 2019 06:19
@foreyes
Copy link
Copy Markdown
Contributor Author

foreyes commented Jul 24, 2019

PTAL. @zz-jason @lamxTyler

@foreyes foreyes changed the title [WIP] planner: add aggregation hints TIDB_HASHAGG and TIDB_STREAMAGG planner: add aggregation hints TIDB_HASHAGG and TIDB_STREAMAGG Jul 24, 2019
Comment thread planner/core/exhaust_physical_plans.go Outdated
Comment thread planner/core/logical_plans.go Outdated
@foreyes foreyes force-pushed the dev/add_agg_hints branch from 37eff4e to 5accbd8 Compare July 24, 2019 06:53
@foreyes
Copy link
Copy Markdown
Contributor Author

foreyes commented Jul 24, 2019

/run-all-tests

Comment thread planner/core/logical_plans.go Outdated
Comment thread planner/core/logical_plan_builder.go Outdated
Comment thread planner/core/exhaust_physical_plans.go Outdated
@zz-jason zz-jason removed their request for review July 24, 2019 08:39
@foreyes foreyes force-pushed the dev/add_agg_hints branch from fa37100 to 2c46284 Compare July 24, 2019 09:42
@foreyes
Copy link
Copy Markdown
Contributor Author

foreyes commented Jul 24, 2019

Code improved, PTAL. @zz-jason @XuHuaiyu

Copy link
Copy Markdown
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@XuHuaiyu
Copy link
Copy Markdown
Contributor

We should update the version of parser in go.mod before merging this commit.

@XuHuaiyu XuHuaiyu added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 25, 2019
@foreyes foreyes force-pushed the dev/add_agg_hints branch 3 times, most recently from 80af97d to 74c1bf3 Compare July 26, 2019 09:52
Comment thread executor/join_test.go Outdated
@foreyes
Copy link
Copy Markdown
Contributor Author

foreyes commented Aug 6, 2019

PTAL. @XuHuaiyu @eurekaka

@foreyes foreyes requested review from XuHuaiyu and eurekaka August 6, 2019 04:32

all, desc := prop.AllSameOrder()
if len(la.possibleProperties) == 0 || !all {
if !all {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not resolved?

}
}

func (s *testPlanSuite) TestAggregationHints(c *C) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case which contains subquery?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first one, because possibleChildProperties are only possible... We'd better not rely too much on it, I handle this in line 1272 - 1275, you can take a look.

For the test case, I will add them soon.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding test case, find another bug. Fix it soon...

@foreyes
Copy link
Copy Markdown
Contributor Author

foreyes commented Aug 6, 2019

Add tests and fix a Merge Join bug, PTAL. @eurekaka @XuHuaiyu

Copy link
Copy Markdown
Contributor

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurekaka eurekaka added the status/LGT1 Indicates that a PR has LGTM 1. label Aug 6, 2019
Copy link
Copy Markdown
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@foreyes
Copy link
Copy Markdown
Contributor Author

foreyes commented Aug 7, 2019

/run-all-tests

@XuHuaiyu
Copy link
Copy Markdown
Contributor

XuHuaiyu commented Aug 7, 2019

I'm still curious that, is this expectable?
Or, this will be fixed in another PR?

CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  KEY `a` (`a`)
);
tidb> desc select /*+ TIDB_HJ(t)  */ a, count(b) from t group by a order by a;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| Projection_22            | 8000.00  | root | test.t.a, 2_col_0                                                  |
| └─StreamAgg_24           | 8000.00  | root | group by:test.t.a, funcs:count(test.t.b), firstrow(test.t.a)       |
|   └─Projection_21        | 10000.00 | root | test.t.a, test.t.b                                                 |
|     └─IndexLookUp_20     | 10000.00 | root |                                                                    |
|       ├─IndexScan_18     | 10000.00 | cop  | table:t, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
|       └─TableScan_19     | 10000.00 | cop  | table:t, keep order:false, stats:pseudo                            |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set, 1 warning (0.00 sec)

tidb> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                               |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Warning | 1815 | There are no matching table names for (t) in optimizer hint /*+ TIDB_HJ(t) */. Maybe you can use the table alias name |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

@XuHuaiyu XuHuaiyu added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 7, 2019
@foreyes foreyes merged commit a530f87 into pingcap:master Aug 7, 2019
@foreyes foreyes deleted the dev/add_agg_hints branch August 7, 2019 02:51
@foreyes
Copy link
Copy Markdown
Contributor Author

foreyes commented Aug 7, 2019

I'm still curious that, is this expectable?
Or, this will be fixed in another PR?

CREATE TABLE `t` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL,
  KEY `a` (`a`)
);
tidb> desc select /*+ TIDB_HJ(t)  */ a, count(b) from t group by a order by a;
+--------------------------+----------+------+--------------------------------------------------------------------+
| id                       | count    | task | operator info                                                      |
+--------------------------+----------+------+--------------------------------------------------------------------+
| Projection_22            | 8000.00  | root | test.t.a, 2_col_0                                                  |
| └─StreamAgg_24           | 8000.00  | root | group by:test.t.a, funcs:count(test.t.b), firstrow(test.t.a)       |
|   └─Projection_21        | 10000.00 | root | test.t.a, test.t.b                                                 |
|     └─IndexLookUp_20     | 10000.00 | root |                                                                    |
|       ├─IndexScan_18     | 10000.00 | cop  | table:t, index:a, range:[NULL,+inf], keep order:true, stats:pseudo |
|       └─TableScan_19     | 10000.00 | cop  | table:t, keep order:false, stats:pseudo                            |
+--------------------------+----------+------+--------------------------------------------------------------------+
6 rows in set, 1 warning (0.00 sec)

tidb> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                               |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
| Warning | 1815 | There are no matching table names for (t) in optimizer hint /*+ TIDB_HJ(t) */. Maybe you can use the table alias name |
+---------+------+-----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

I know this case, it's expected, but looks weird, I will fix it in another PR. @XuHuaiyu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/new-feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants