Skip to content

Loosen the restriction outer path has Motion of parallel plan#134

Merged
avamingli merged 1 commit intoapache:mainfrom
avamingli:relax_outer_motion_path
Aug 24, 2023
Merged

Loosen the restriction outer path has Motion of parallel plan#134
avamingli merged 1 commit intoapache:mainfrom
avamingli:relax_outer_motion_path

Conversation

@avamingli
Copy link
Copy Markdown
Contributor

@avamingli avamingli commented Aug 10, 2023

We drop all outer paths who have Motion of parallel plan to avoid deadlock when mix parallel-aware hashjoin with parallel-oblivious paths.

And we fix it to enable_parallel, which is stricter than needed.
It's possible to keep such a path when enable_parallel is on and enable_parallel_hash is off.
Because we can make sure that there is no parallel-aware hashjoin, and of course, no deadlock issue like above.

By loosening the restriction to enable_parallel_hash, such parallel-oblivious plan would be possible.

explain(costs off) select * from t1 right join t2 on t1.b = t2.a;
                            QUERY PLAN
------------------------------------------------------------------
 Gather Motion 6:1  (slice1; segments: 6)
   ->  Hash Left Join
         Hash Cond: (t2.a = t1.b)
         ->  Redistribute Motion 6:6  (slice2; segments: 6)
               Hash Key: t2.a
               Hash Module: 3
               ->  Parallel Seq Scan on t2
         ->  Hash
               ->  Redistribute Motion 3:6  (slice3; segments: 3)
                     Hash Key: t1.b
                     Hash Module: 3
                     ->  Seq Scan on t1
 Optimizer: Postgres query optimizer
(13 rows)

Before this change, we could only get a non-parallel plan when enable_parallel=on and enable_parallel_hash=off

explain(costs off) select * from t1 right join t2 on t1.b = t2.a;
                            QUERY PLAN                            
------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Hash Right Join
         Hash Cond: (t1.b = t2.a)
         ->  Redistribute Motion 3:3  (slice2; segments: 3)
               Hash Key: t1.b
               ->  Seq Scan on t1
         ->  Hash
               ->  Redistribute Motion 3:3  (slice3; segments: 3)
                     Hash Key: t2.a
                     ->  Seq Scan on t2
 Optimizer: Postgres query optimizer
(11 rows)

Authored-by: Zhang Mingli avamingli@gmail.com

closes: #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to @cloudberrydb/dev team for review and approval when your PR is ready🥳

@avamingli avamingli force-pushed the relax_outer_motion_path branch from cd9a475 to aa86ba9 Compare August 10, 2023 05:54
@avamingli avamingli self-assigned this Aug 10, 2023
@avamingli
Copy link
Copy Markdown
Contributor Author

This causes SemiJoin plan diffs, need to dig.

We drop all outer paths who have Motion of parallel plan to avoid
deadlock when mixing parallel-aware hashjoin with parallel-oblivious
paths.

And we fix it to enable_parallel, which is stricter than needed.
It's possible to keep such a path when enable_parallel is on and
enable_parallel_hash is off.
Because we can make sure that there is no parallel-aware hashjoin,
and of course, no deadlock issues like above.

By loosening the restriction to enable_parallel_hash, such parallel
-oblivious plan would be possible.

explain(costs off) select * from t1 right join t2 on t1.b = t2.a;
                            QUERY PLAN
------------------------------------------------------------------
 Gather Motion 6:1  (slice1; segments: 6)
   ->  Hash Left Join
         Hash Cond: (t2.a = t1.b)
         ->  Redistribute Motion 6:6  (slice2; segments: 6)
               Hash Key: t2.a
               Hash Module: 3
               ->  Parallel Seq Scan on t2
         ->  Hash
               ->  Redistribute Motion 3:6  (slice3; segments: 3)
                     Hash Key: t1.b
                     Hash Module: 3
                     ->  Seq Scan on t1
 Optimizer: Postgres query optimizer
(13 rows)

Authored-by: Zhang Mingli avamingli@gmail.com
@avamingli avamingli force-pushed the relax_outer_motion_path branch from aa86ba9 to 74aa5c7 Compare August 10, 2023 08:27
@avamingli
Copy link
Copy Markdown
Contributor Author

This causes SemiJoin plan diffs, need to dig.

Fixed.

Comment thread src/test/regress/expected/gp_parallel.out
@avamingli avamingli merged commit 8076d0b into apache:main Aug 24, 2023
@avamingli avamingli deleted the relax_outer_motion_path branch August 24, 2023 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants