Skip to content

Conversation

@marmbrus
Copy link
Contributor

Previously it was okay to throw away subqueries after analysis, as we would never try to use that tree for resolution again. However, with eager analysis in DataFrames this can cause errors for queries such as:

val df = Seq(1,2,3).map(i => (i, i.toString)).toDF("int", "str")
df.as('x).join(df.as('y), $"x.str" === $"y.str").groupBy("x.str").count()

As a result, in this PR we defer the elimination of subqueries until the optimization phase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick here - can we put an explicit type?

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29066 has finished for PR 5160 at commit 81cd597.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29071 has finished for PR 5160 at commit 9137e03.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TaskCommitDenied(jobID: Int, partitionID: Int, attemptID: Int) extends TaskFailedReason
    • class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends Source
    • class MLPairRDDFunctions[K: ClassTag, V: ClassTag](self: RDD[(K, V)]) extends Serializable
    • class NaiveBayesModel(Saveable, Loader):
    • class SqlParser extends AbstractSparkSQLParser with DataTypeParser
    • case class CombineSum(child: Expression) extends AggregateExpression
    • case class CombineSumFunction(expr: Expression, base: AggregateExpression)
    • protected[sql] class DataTypeException(message: String) extends Exception(message)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment at here to let others know that the first step in Optimizer is to remove SubQueries (which are helper wrappers for query analysis)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29100 has finished for PR 5160 at commit 27d25bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Mar 24, 2015

LGTM

@asfgit asfgit closed this in cbeaf9e Mar 24, 2015
asfgit pushed a commit that referenced this pull request Mar 24, 2015
Previously it was okay to throw away subqueries after analysis, as we would never try to use that tree for resolution again.  However, with eager analysis in `DataFrame`s this can cause errors for queries such as:

```scala
val df = Seq(1,2,3).map(i => (i, i.toString)).toDF("int", "str")
df.as('x).join(df.as('y), $"x.str" === $"y.str").groupBy("x.str").count()
```

As a result, in this PR we defer the elimination of subqueries until the optimization phase.

Author: Michael Armbrust <michael@databricks.com>

Closes #5160 from marmbrus/subqueriesInDfs and squashes the following commits:

a9bb262 [Michael Armbrust] Update Optimizer.scala
27d25bf [Michael Armbrust] fix hive tests
9137e03 [Michael Armbrust] add type
81cd597 [Michael Armbrust] Avoid eliminating subqueries until optimization

(cherry picked from commit cbeaf9e)
Signed-off-by: Michael Armbrust <michael@databricks.com>
@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29104 has finished for PR 5160 at commit a9bb262.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus marmbrus deleted the subqueriesInDfs branch August 3, 2015 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants