Skip to content

Conversation

@wForget
Copy link
Member

@wForget wForget commented Jul 5, 2024

What changes were proposed in this pull request?

Eagerly execute union multi commands together.

Why are the changes needed?

MultiInsert is split to multiple sql executions, resulting in no exchange reuse.

Reproduce sql:

create table wangzhen_t1(c1 int);
create table wangzhen_t2(c1 int);
create table wangzhen_t3(c1 int);
insert into wangzhen_t1 values (1), (2), (3);

from (select /*+ REPARTITION(3) */ c1 from wangzhen_t1)
insert overwrite table wangzhen_t2 select c1
insert overwrite table wangzhen_t3 select c1; 

In Spark 3.1, there is only one SQL execution and there is a reuse exchange.

image

However, in Spark 3.5, it was split to multiple executions and there was no ReuseExchange.

image
image

Does this PR introduce any user-facing change?

yes, multi inserts will executed in one execution.

How was this patch tested?

added unit test

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jul 5, 2024
@wForget
Copy link
Member Author

wForget commented Jul 5, 2024

It seems to be caused by #32513

@wForget
Copy link
Member Author

wForget commented Jul 5, 2024

@cloud-fan @beliefer Could you please take a look?

Copy link
Contributor

@ulysses-you ulysses-you left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except some minor comments

@ulysses-you
Copy link
Contributor

thanks, merged to master

@cloud-fan
Copy link
Contributor

late LGTM

jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
### What changes were proposed in this pull request?

Eagerly execute union multi commands together.

### Why are the changes needed?
MultiInsert is split to multiple sql executions, resulting in no exchange reuse.

Reproduce sql:

```
create table wangzhen_t1(c1 int);
create table wangzhen_t2(c1 int);
create table wangzhen_t3(c1 int);
insert into wangzhen_t1 values (1), (2), (3);

from (select /*+ REPARTITION(3) */ c1 from wangzhen_t1)
insert overwrite table wangzhen_t2 select c1
insert overwrite table wangzhen_t3 select c1;
```

In Spark 3.1, there is only one SQL execution and there is a reuse exchange.

![image](https://github.com/apache/spark/assets/17894939/5ff68392-aaa8-4e6b-8cac-1687880796b9)

However, in Spark 3.5, it was split to multiple executions and there was no ReuseExchange.

![image](https://github.com/apache/spark/assets/17894939/afdb14b6-5007-4923-802d-535149974ecf)
![image](https://github.com/apache/spark/assets/17894939/0d60e8db-9da7-4906-8d07-2b622b55e6ab)

### Does this PR introduce _any_ user-facing change?

yes,  multi  inserts will executed in one execution.

### How was this patch tested?

added unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47224 from wForget/SPARK-48817.

Authored-by: wforget <643348094@qq.com>
Signed-off-by: youxiduo <youxiduo@corp.netease.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants