-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-12725] [SQL] Resolving Name Conflicts in SQL Generation and Name Ambiguity Caused by Internally Generated Expressions #11050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #50662 has finished for PR 11050 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not put isGenerated into the second list with the other metadata so that we don't have to rewrite every match statement in the code base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh, I see. Will do it. : )
|
We should also modify the |
|
@marmbrus Do you still remember the JIRA number or the case? If so, I can add it into the test cases and verify if the fix works well. In this PR, |
|
I'd have to search for it as it was a while ago, you should be able to trigger it by forcing the analyzer to generate a column while referencing a column with the same name. I wasn't referring to the |
|
@marmbrus : ) Now I understand it. Will do it. Thank you! |
isGenerated to Alias and AttributeReference|
I guess the PR you mentioned is #8231 I think we should disallow users to accidentally use the internally generated names. Thus, adding the following test case to verify if the fix works well. In this case, the table does not have a column named errorTest(
"unresolved attributes with a generated name",
testRelation2.groupBy('a)(max('b))
.where(sum('b) > 0)
.orderBy('havingCondition.asc),
"cannot resolve" :: "havingCondition" :: Nil) |
|
Yeah, that test looks great. |
|
Test build #50695 has finished for PR 11050 at commit
|
|
Test build #50704 has finished for PR 11050 at commit
|
| UnresolvedAlias(child = f.copy(children = newChildren)) :: Nil | ||
| case Alias(f @ UnresolvedFunction(_, args, _), name) if containsStar(args) => | ||
| case a @ Alias(f @ UnresolvedFunction(_, args, _), name) | ||
| if containsStar(args) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This change can be reverted now.
|
Test build #50875 has finished for PR 11050 at commit
|
|
retest this please |
|
Test build #50882 has finished for PR 11050 at commit
|
|
retest this please |
|
Test build #50893 has finished for PR 11050 at commit
|
|
retest this please |
|
Test build #50908 has finished for PR 11050 at commit
|
|
retest this please |
|
Test build #50926 has finished for PR 11050 at commit
|
|
LGTM, merging to master. Thanks! |
| val qualifiersString = | ||
| if (qualifiers.isEmpty) "" else qualifiers.map("`" + _ + "`").mkString("", ".", ".") | ||
| s"${child.sql} AS $qualifiersString`$name`" | ||
| val aliasName = if (isGenerated) s"$name#${exprId.id}" else s"$name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change the sql according to isGenerate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see. This is to workaround an issue in SQLBuilder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. You can get more background from the discussion in the JIRA: https://issues.apache.org/jira/browse/SPARK-12725
…rom Alias and AttributeReference ## What changes were proposed in this pull request? `isTableSample` and `isGenerated ` were introduced for SQL Generation respectively by #11148 and #11050 Since SQL Generation is removed, we do not need to keep `isTableSample`. ## How was this patch tested? The existing test cases Author: Xiao Li <gatorsmile@gmail.com> Closes #18379 from gatorsmile/CleanSample.
…rom Alias and AttributeReference ## What changes were proposed in this pull request? `isTableSample` and `isGenerated ` were introduced for SQL Generation respectively by apache#11148 and apache#11050 Since SQL Generation is removed, we do not need to keep `isTableSample`. ## How was this patch tested? The existing test cases Author: Xiao Li <gatorsmile@gmail.com> Closes apache#18379 from gatorsmile/CleanSample.
Some analysis rules generate aliases or auxiliary attribute references with the same name but different expression IDs. For example,
ResolveAggregateFunctionsintroduceshavingConditionandaggOrder, andDistinctAggregationRewriterintroducesgid.This is OK for normal query execution since these attribute references get expression IDs. However, it's troublesome when converting resolved query plans back to SQL query strings since expression IDs are erased.
Here's an example Spark 1.6.0 snippet for illustration:
The above code produces the following resolved plan:
Here we can see that both aggregate expressions in
ORDER BYare extracted into anAggregateoperator, and both of them are namedaggOrderwith different expression IDs.The solution is to automatically add the expression IDs into the attribute name for the Alias and AttributeReferences that are generated by Analyzer in SQL Generation.
In this PR, it also resolves another issue. Users could use the same name as the internally generated names. The duplicate names should not cause name ambiguity. When resolving the column, Catalyst should not pick the column that is internally generated.
Could you review the solution? @marmbrus @liancheng
I did not set the newly added flag for all the alias and attribute reference generated by Analyzers. Please let me know if I should do it? Thank you!