[SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver #15325

dongjoon-hyun · 2016-10-02T07:15:59Z

What changes were proposed in this pull request?

Currently, Spark Thrift Server raises IllegalArgumentException for queries whose column types are NullType, e.g., SELECT null or SELECT if(true,null,null). This PR fixes that by returning void like Hive 1.2.

Before

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
Closing: 0: jdbc:hive2://localhost:10000

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
Closing: 0: jdbc:hive2://localhost:10000

After

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-------+--+
| NULL  |
+-------+--+
| NULL  |
+-------+--+
1 row selected (3.242 seconds)
Beeline version 1.2.1.spark2 by Apache Hive
Closing: 0: jdbc:hive2://localhost:10000

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-------------------------+--+
| (IF(true, NULL, NULL))  |
+-------------------------+--+
| NULL                    |
+-------------------------+--+
1 row selected (0.201 seconds)
Beeline version 1.2.1.spark2 by Apache Hive
Closing: 0: jdbc:hive2://localhost:10000

How was this patch tested?

Pass the Jenkins test with a new testsuite.
Also, Manually, after starting Spark Thrift Server, run the following command.

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"

Hive 1.2

hive> create table null_table as select null;
hive> desc null_table;
OK
_c0                     void

…eption in Thriftserver

SparkQA · 2016-10-02T07:44:02Z

Test build #66238 has finished for PR 15325 at commit 419c618.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-10-02T22:40:15Z

We need to add a test case. We don't necessarily need an end-to-end test, but we can refactor this schema output function slightly to make it testable.

dongjoon-hyun · 2016-10-02T22:42:59Z

Thank you for review, @rxin . I see. I'll find some place for a testcase about schema output function.

dongjoon-hyun · 2016-10-03T06:08:46Z

Hi, @rxin .
I refactored to a function and added a testcase for that. Could you review this again?

rxin · 2016-10-03T06:12:26Z

.../test/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperationSuite.scala

+import org.apache.spark.sql.SparkSession
+
+class SparkExecuteStatementOperationSuite
+  extends SparkFunSuite with BeforeAndAfterAll with Logging {


you don't need logging here do you?

Yep. Right. I'll remove that.

rxin · 2016-10-03T06:13:06Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

 }
+
+object SparkExecuteStatementOperation extends Logging {
+  def getTableSchema(result: DataFrame): TableSchema = {


can this just take a StructType and return a TableSchema? Then you don't need any of the SparkSession setup in test suites.

Do you mean using result.queryExecution.analyzed.schema instead of result.queryExecution.analyzed.output? I see. I'll update the scope of refactored function and testcase. Although it changes the previous logic, that will make the test suite the simplest. Thank you.

May I use reult.queryExecution.analyzed.output instead? I mean Seq[Attribute] instead of StructType.

SparkQA · 2016-10-03T06:33:23Z

Test build #66253 has finished for PR 15325 at commit 5df2e06.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-10-03T07:11:16Z

Thank you, @rxin . The testsuite is simplified much.

rxin · 2016-10-03T07:26:55Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala

 }
+
+object SparkExecuteStatementOperation extends Logging {
+  def getTableSchema(output: Seq[Attribute]): TableSchema = {


I actually meant using StructType, since you can get that from result.schema. Then this is built entirely on public APIs.

Oh, I see. Sorry, I just thought it's an example to remove SparkSession stuff. I'll fix that again. Thank you for fast reply.

SparkQA · 2016-10-03T07:38:39Z

Test build #66255 has finished for PR 15325 at commit b3deec2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-10-03T08:04:11Z

I've overlooked the dependency between module and used catalyst package wrongly here before. Now, it seems to be correct. Thank you for teaching me.

SparkQA · 2016-10-03T08:18:27Z

Test build #66256 has finished for PR 15325 at commit 32b2bb2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-10-03T17:22:30Z

...r/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala


  private lazy val resultSchema: TableSchema = {
-    if (result == null || result.queryExecution.analyzed.output.size == 0) {
+    if (result == null || result.queryExecution.analyzed.output.isEmpty) {


while you are at this, can we change this to result.schema, rather than "result.queryExecution.analyzed.output"?

Done. That's much better. I fixed those to use result.schema as you mentioned. Especially, logInfo(..), it changed to print the real result schema instead of attribute.

Before: Result Schema: List(1#5, a#6)

After: Result Schema: StructType(StructField(1,IntegerType,false), StructField(a,StringType,false))

SparkQA · 2016-10-03T18:46:03Z

Test build #66266 has finished for PR 15325 at commit 424a601.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-10-04T04:27:22Z

Thanks - merging in master/2.0.

…eption in Thriftserver ## What changes were proposed in this pull request? Currently, Spark Thrift Server raises `IllegalArgumentException` for queries whose column types are `NullType`, e.g., `SELECT null` or `SELECT if(true,null,null)`. This PR fixes that by returning `void` like Hive 1.2. **Before** ```sql $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0) Closing: 0: jdbc:hive2://localhost:10000 $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0) Closing: 0: jdbc:hive2://localhost:10000 ``` **After** ```sql $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ +-------+--+ | NULL | +-------+--+ | NULL | +-------+--+ 1 row selected (3.242 seconds) Beeline version 1.2.1.spark2 by Apache Hive Closing: 0: jdbc:hive2://localhost:10000 $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ +-------------------------+--+ | (IF(true, NULL, NULL)) | +-------------------------+--+ | NULL | +-------------------------+--+ 1 row selected (0.201 seconds) Beeline version 1.2.1.spark2 by Apache Hive Closing: 0: jdbc:hive2://localhost:10000 ``` ## How was this patch tested? * Pass the Jenkins test with a new testsuite. * Also, Manually, after starting Spark Thrift Server, run the following command. ```sql $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null" $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)" ``` **Hive 1.2** ```sql hive> create table null_table as select null; hive> desc null_table; OK _c0 void ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #15325 from dongjoon-hyun/SPARK-17112. (cherry picked from commit c571cfb) Signed-off-by: Reynold Xin <rxin@databricks.com>

dongjoon-hyun · 2016-10-04T05:14:05Z

Thank You so much for review and merging, @rxin .

[SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentExc…

419c618

…eption in Thriftserver

Add testcase.

5df2e06

rxin reviewed Oct 3, 2016

View reviewed changes

Simplify testsuite.

b3deec2

rxin reviewed Oct 3, 2016

View reviewed changes

Address comments.

32b2bb2

rxin reviewed Oct 3, 2016

View reviewed changes

Use result.schema.

424a601

asfgit closed this in c571cfb Oct 4, 2016

dongjoon-hyun deleted the SPARK-17112 branch November 7, 2016 00:51

[SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver #15325

[SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver #15325

Uh oh!

Conversation

dongjoon-hyun commented Oct 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 2, 2016

Uh oh!

rxin commented Oct 2, 2016

Uh oh!

dongjoon-hyun commented Oct 2, 2016

Uh oh!

dongjoon-hyun commented Oct 3, 2016

Uh oh!

rxin Oct 3, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 3, 2016

Choose a reason for hiding this comment

Uh oh!

rxin Oct 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 3, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 3, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 3, 2016

Uh oh!

dongjoon-hyun commented Oct 3, 2016

Uh oh!

rxin Oct 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 3, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 3, 2016

Uh oh!

dongjoon-hyun commented Oct 3, 2016

Uh oh!

SparkQA commented Oct 3, 2016

Uh oh!

rxin Oct 3, 2016

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 3, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 3, 2016

Uh oh!

rxin commented Oct 4, 2016

Uh oh!

dongjoon-hyun commented Oct 4, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dongjoon-hyun commented Oct 2, 2016 •

edited

Loading

rxin Oct 3, 2016 •

edited

Loading

rxin Oct 3, 2016 •

edited

Loading