Skip to content

Convert and Propagate Join hints#17406

Closed
sreemanamala wants to merge 4 commits intoapache:masterfrom
sreemanamala:join-hints
Closed

Convert and Propagate Join hints#17406
sreemanamala wants to merge 4 commits intoapache:masterfrom
sreemanamala:join-hints

Conversation

@sreemanamala
Copy link
Copy Markdown
Contributor

Description

This commit aims to propagate sql join hints to pick the specific join algorithm

Release note


Key changed/added classes in this PR
  • CalcitePlanner
  • DataSourcePlan
  • JoinDataSource

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 24, 2024
@sreemanamala sreemanamala marked this pull request as draft October 24, 2024 13:49
@Override
protected String getString(RelNode node)
{
final List<String> hintsCollect = new ArrayList<>();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this could be created as a field in the HintCollector

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a method inside HintCollecter to get the hints converted as String

Comment on lines +305 to +307
if (!scan.getHints().isEmpty()) {
this.hintsCollect.add("TableScan:" + scan.getHints());
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: you could remove all these conditionals; add a collectHints method with kinda the same content; and use relNode.getClass().getSimpleName() to get TableScan and similar names

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not give the abstracted class names, Initially I copied this calcite. But i guess for our case we doent need to have all those implementations. removed and modified as mentioned in the other comments

Comment thread sql/src/test/java/org/apache/druid/quidem/DruidQuidemCommandHandler.java Outdated
Comment thread sql/src/test/java/org/apache/druid/quidem/DruidQuidemCommandHandler.java Outdated
Comment thread sql/src/test/java/org/apache/druid/quidem/DruidQuidemCommandHandler.java Outdated
}

if (!plannerContext.getJoinAlgorithm().requiresSubquery()
if (!QueryUtils.getJoinAlgorithm(join, plannerContext).requiresSubquery()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I feel like both PlannerContext and QueryUtils is a bit unfortunate place for these....
have you considered placing it in JoinAlgorithm?

you should probably also extract a JoinAlgorithm to a local variable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will move this out from QueryUtils in next batch of changes where I need to update the method as well.

Comment on lines +63 to +67
if (join.getHints().isEmpty()) {
return plannerContext.getJoinAlgorithm();
}

return JoinHint.fromString(join.getHints().get(0).hintName).getJoinAlgorithm();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may work right now but it assumes that there is only one hint and that may only be a JoinHint

Comment thread sql/src/main/java/org/apache/druid/sql/calcite/planner/JoinHint.java Outdated
*
* @return HintStrategyTable instance
*/
private static HintStrategyTable createHintStrategies()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I'm not sure if the HintTools class is needed; you could just have this method alone...its already static

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need this as we would also have to add the predicates

"duration" : __DURATION__
"sort" : true
} ]
!msqPlan
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: its near impossible to read these msq plans....not sure what we could do about that

@kgyrtkirk
Copy link
Copy Markdown
Member

kgyrtkirk commented Nov 15, 2024

I was looking into this a bit and here is what I've found:

  • there is an unexpected overwrite of previous hints; if the same key is used:
HintStrategyTable.builder()
  .hintStrategy(key1, ...)
  .hintStrategy(key1, ...);

...this is just unfortunate - I think it was unexpected

  • I don't see a way to attach hints to the tablescan/join nodes w/o modifying the parser...
  • exposing the aliases to be accessible during the predicate matching also seems near impossible: as the hints are propagate to the reltree at the end - which had already erased all details about originating names

based on the above and some ideas during debugging I came with some ideas around:

  • set hints only on TableScan -s as broadcast / sort_merge with some priority
  • for a join the selected algo is the one with the maximal priority beneath its right branch
  • in general I think people want to only specify it for some table

ex:

select /*+ broadcast(wikipedia,2), sort_merge(foo,1) */
  from wikipedia w2 join (select * from wikipedia w1 join foo) a

maybe we could start with an even simpler model and only introduce brodacast(tableName) and sort_merge(tableName)?
with heuristics like:

  • broadcast is selected if all right hand side sources are set to be broadcast
  • a single sort_merge forces sort_merge regardless what other table's hints

@adarshsanjeev adarshsanjeev mentioned this pull request Dec 6, 2024
10 tasks
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions
Copy link
Copy Markdown

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions Bot closed this Feb 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants