-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-6376][SQL] Avoid eliminating subqueries until optimization #5160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick here - can we put an explicit type?
|
Test build #29066 has finished for PR 5160 at commit
|
|
Test build #29071 has finished for PR 5160 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment at here to let others know that the first step in Optimizer is to remove SubQueries (which are helper wrappers for query analysis)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
Test build #29100 has finished for PR 5160 at commit
|
|
LGTM |
Previously it was okay to throw away subqueries after analysis, as we would never try to use that tree for resolution again. However, with eager analysis in `DataFrame`s this can cause errors for queries such as:
```scala
val df = Seq(1,2,3).map(i => (i, i.toString)).toDF("int", "str")
df.as('x).join(df.as('y), $"x.str" === $"y.str").groupBy("x.str").count()
```
As a result, in this PR we defer the elimination of subqueries until the optimization phase.
Author: Michael Armbrust <michael@databricks.com>
Closes #5160 from marmbrus/subqueriesInDfs and squashes the following commits:
a9bb262 [Michael Armbrust] Update Optimizer.scala
27d25bf [Michael Armbrust] fix hive tests
9137e03 [Michael Armbrust] add type
81cd597 [Michael Armbrust] Avoid eliminating subqueries until optimization
(cherry picked from commit cbeaf9e)
Signed-off-by: Michael Armbrust <michael@databricks.com>
|
Test build #29104 has finished for PR 5160 at commit
|
Previously it was okay to throw away subqueries after analysis, as we would never try to use that tree for resolution again. However, with eager analysis in
DataFrames this can cause errors for queries such as:As a result, in this PR we defer the elimination of subqueries until the optimization phase.