Skip to content

Do not add exchange when table's distributioin satisfy the distribution requirements #4481

@liutang123

Description

@liutang123

issue Reason

Now, Doris will create two fragments for aggretion.
Sometime, Exchange for aggregation is unnecessary.
Think about follow cases:

  1. aggregate an unpartitioned table.
    create table SQL:
CREATE TABLE `llj_test_1` (
  `dt` int(11) NOT NULL COMMENT "",
  `dis_key` varchar(20) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`dt`, `dis_key`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dt`, `dis_key`) BUCKETS 3
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);

query SQL:select dt, dis_key,count(1) from llj_test_1 group by dt, dis_key;
2. aggregate a partitioned table.
create table SQL:

CREATE TABLE `llj_test` (
  `dt` int(11) NOT NULL COMMENT "",
  `dis_key` varchar(20) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`dt`, `dis_key`)
COMMENT "OLAP"
PARTITION BY RANGE(`dt`)
(PARTITION p20180822 VALUES [("19000101"), ("20181021")),
PARTITION p20181207 VALUES [("20181021"), ("20181022")))
DISTRIBUTED BY HASH(`dt`, `dis_key`) BUCKETS 3
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);

query SQL:select dt, dis_key,count(1) from llj_test group by dt, dis_key;

Suggestion

In DistributedPlanner, do not add the unnecessary Exchanges.
For case 1, we only need to judge that the table's distribute hash keys is a subset of the aggregate keys.
For case 2, we should jude two conditions:

  • partition keys are also hash keys.
  • the table's distribute hash keys is a subset of the aggregate keys.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions