Skip to content

Parallel REFRESH MATERIALIZED VIEW and CTAS for AO/AOCS#122

Merged
avamingli merged 1 commit intoapache:mainfrom
avamingli:parallel_refresh_matv_ao
Oct 11, 2023
Merged

Parallel REFRESH MATERIALIZED VIEW and CTAS for AO/AOCS#122
avamingli merged 1 commit intoapache:mainfrom
avamingli:parallel_refresh_matv_ao

Conversation

@avamingli
Copy link
Copy Markdown
Contributor

@avamingli avamingli commented Aug 7, 2023

Make the SELECT part of REFRESH parallel for AO/AOCS storage MATERIALIZED VIEW.
Make the SELECT part of CREATE TABLE AS parallel for AO/AOCS storage table.

Parallel processes couldn't have writeable operations, assertions like below are added by PG:
'cannot update tuples during a parallel operation'.
It's not a problem for PG as workers are launched by Gather node, and the SELECT part of Refresh MV/CTAS could be parallel.
However, AO/AOCS will require batches of Row Numbers generated from gp_fastquence which will in-place update catalog. And CBDB will EnterParallelMode() anyway when ExecutePlan in QE if there is parallel across the whole plan.

Use EnterParallelMode() only for the slices who have multiple parallel workers, in theory, slices execute the SELECT part of a parallel plan.

Performance

(Only one time test on QingYun Cloud)

Plan Refresh AO MatV Refresh AOCO MatV CTAS AO CTAS AOCO
non-parallel 6.18 5.91 6.56 6.06
parallel(4 workers) 2.83 2.81 2.37 2.48

DDL:

create table t1(c1 int, c2 int) with(parallel_workers=4) distributed by(c1);
insert into t1 select i, i+1 from generate_series(1, 10000000)i;
View definition:
 SELECT sum(a.c1) AS c1,
    avg(b.c2) AS c2
   FROM t1 a
     JOIN t1 b ON a.c1 = b.c1;
Distributed by: (c1)

A Parallel Plan of CTAS for AOCS

explain(costs off) create table ctas_aoco using ao_column as select sum(a.c2) as c2, avg(b.c1) as c1 from t_p a join t_p b on a.c1 = b.c1 distributed by(c2);
                           QUERY PLAN
----------------------------------------------------------------
 Redistribute Motion 1:3  (slice1; segments: 1)
   Hash Key: (sum(a.c2))
   ->  Finalize Aggregate
         ->  Gather Motion 12:1  (slice2; segments: 12)
               ->  Partial Aggregate
                     ->  Parallel Hash Join
                           Hash Cond: (a.c1 = b.c1)
                           ->  Parallel Seq Scan on t_p a
                           ->  Parallel Hash
                                 ->  Parallel Seq Scan on t_p b
 Optimizer: Postgres query optimizer
(11 rows)

Authored-by: Zhang Mingli avamingli@gmail.com

closes: #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to @cloudberrydb/dev team for review and approval when your PR is ready🥳

@avamingli avamingli self-assigned this Aug 7, 2023
Comment thread src/backend/executor/execMain.c
@avamingli avamingli force-pushed the parallel_refresh_matv_ao branch from 8a89a71 to 4ac9775 Compare August 7, 2023 03:32
Comment thread src/test/regress/sql/gp_parallel.sql
@avamingli avamingli force-pushed the parallel_refresh_matv_ao branch 2 times, most recently from 5948a0a to cc15270 Compare August 7, 2023 06:16
@avamingli
Copy link
Copy Markdown
Contributor Author

close #115

@avamingli
Copy link
Copy Markdown
Contributor Author

Rebased && fix some regression cases.

@avamingli avamingli force-pushed the parallel_refresh_matv_ao branch 7 times, most recently from c3c6a11 to d6eb58d Compare August 10, 2023 02:34
@my-ship-it my-ship-it requested a review from foreyes August 31, 2023 01:18
yjhjstz
yjhjstz previously approved these changes Aug 31, 2023
Comment thread src/test/regress/expected/gp_parallel.out
Make the SELECT part of REFRESH parallel for AO/AOCS
storage MATERIALIZED VIEW.
Make the SELECT part of CREATE TABLE AS parallel for
AO/AOCS storage table.

Parallel processes couldn't have writeable operations,
assertions like below are added by PG:
'cannot update tuples during a parallel operation'.
It's not a problem for PG as workers are launched by Gather node,
and the SELECT part of Refresh MV/CTAS could be parallel.
However, AO/AOCS will require batches of Row Numbers generated
from gp_fastquence which will in-place update catalog.
And CBDB will EnterParallelMode() anyway when ExecutePlan
in QE if there is parallel across the whole plan.

Use EnterParallelMode() only for the slices who have multiple parallel
workers, in theory, slices execute the SELECT part of a parallel plan.

Authored-by: Zhang Mingli avamingli@gmail.com
Copy link
Copy Markdown
Contributor

@my-ship-it my-ship-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@foreyes foreyes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@avamingli
Copy link
Copy Markdown
Contributor Author

Pushed, thanks for review~

@avamingli avamingli merged commit 3e5fad5 into apache:main Oct 11, 2023
@avamingli avamingli deleted the parallel_refresh_matv_ao branch October 11, 2023 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants