Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 2, 2016

What changes were proposed in this pull request?

Currently, Spark Thrift Server raises IllegalArgumentException for queries whose column types are NullType, e.g., SELECT null or SELECT if(true,null,null). This PR fixes that by returning void like Hive 1.2.

Before

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
Closing: 0: jdbc:hive2://localhost:10000

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
Closing: 0: jdbc:hive2://localhost:10000

After

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-------+--+
| NULL  |
+-------+--+
| NULL  |
+-------+--+
1 row selected (3.242 seconds)
Beeline version 1.2.1.spark2 by Apache Hive
Closing: 0: jdbc:hive2://localhost:10000

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-------------------------+--+
| (IF(true, NULL, NULL))  |
+-------------------------+--+
| NULL                    |
+-------------------------+--+
1 row selected (0.201 seconds)
Beeline version 1.2.1.spark2 by Apache Hive
Closing: 0: jdbc:hive2://localhost:10000

How was this patch tested?

  • Pass the Jenkins test with a new testsuite.
  • Also, Manually, after starting Spark Thrift Server, run the following command.
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"

Hive 1.2

hive> create table null_table as select null;
hive> desc null_table;
OK
_c0                     void                                        

@SparkQA
Copy link

SparkQA commented Oct 2, 2016

Test build #66238 has finished for PR 15325 at commit 419c618.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Oct 2, 2016

We need to add a test case. We don't necessarily need an end-to-end test, but we can refactor this schema output function slightly to make it testable.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @rxin . I see. I'll find some place for a testcase about schema output function.

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin .
I refactored to a function and added a testcase for that. Could you review this again?

import org.apache.spark.sql.SparkSession

class SparkExecuteStatementOperationSuite
extends SparkFunSuite with BeforeAndAfterAll with Logging {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need logging here do you?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Right. I'll remove that.

}

object SparkExecuteStatementOperation extends Logging {
def getTableSchema(result: DataFrame): TableSchema = {
Copy link
Contributor

@rxin rxin Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this just take a StructType and return a TableSchema? Then you don't need any of the SparkSession setup in test suites.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean using result.queryExecution.analyzed.schema instead of result.queryExecution.analyzed.output? I see. I'll update the scope of refactored function and testcase. Although it changes the previous logic, that will make the test suite the simplest. Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I use reult.queryExecution.analyzed.output instead? I mean Seq[Attribute] instead of StructType.

@SparkQA
Copy link

SparkQA commented Oct 3, 2016

Test build #66253 has finished for PR 15325 at commit 5df2e06.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Thank you, @rxin . The testsuite is simplified much.

}

object SparkExecuteStatementOperation extends Logging {
def getTableSchema(output: Seq[Attribute]): TableSchema = {
Copy link
Contributor

@rxin rxin Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant using StructType, since you can get that from result.schema. Then this is built entirely on public APIs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Sorry, I just thought it's an example to remove SparkSession stuff. I'll fix that again. Thank you for fast reply.

@SparkQA
Copy link

SparkQA commented Oct 3, 2016

Test build #66255 has finished for PR 15325 at commit b3deec2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

I've overlooked the dependency between module and used catalyst package wrongly here before. Now, it seems to be correct. Thank you for teaching me.

@SparkQA
Copy link

SparkQA commented Oct 3, 2016

Test build #66256 has finished for PR 15325 at commit 32b2bb2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


private lazy val resultSchema: TableSchema = {
if (result == null || result.queryExecution.analyzed.output.size == 0) {
if (result == null || result.queryExecution.analyzed.output.isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you are at this, can we change this to result.schema, rather than "result.queryExecution.analyzed.output"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. That's much better. I fixed those to use result.schema as you mentioned. Especially, logInfo(..), it changed to print the real result schema instead of attribute.

  • Before: Result Schema: List(1#5, a#6)
  • After: Result Schema: StructType(StructField(1,IntegerType,false), StructField(a,StringType,false))

@SparkQA
Copy link

SparkQA commented Oct 3, 2016

Test build #66266 has finished for PR 15325 at commit 424a601.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Oct 4, 2016

Thanks - merging in master/2.0.

@asfgit asfgit closed this in c571cfb Oct 4, 2016
asfgit pushed a commit that referenced this pull request Oct 4, 2016
…eption in Thriftserver

## What changes were proposed in this pull request?

Currently, Spark Thrift Server raises `IllegalArgumentException` for queries whose column types are `NullType`, e.g., `SELECT null` or `SELECT if(true,null,null)`. This PR fixes that by returning `void` like Hive 1.2.

**Before**
```sql
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
Closing: 0: jdbc:hive2://localhost:10000

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0)
Closing: 0: jdbc:hive2://localhost:10000
```

**After**
```sql
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-------+--+
| NULL  |
+-------+--+
| NULL  |
+-------+--+
1 row selected (3.242 seconds)
Beeline version 1.2.1.spark2 by Apache Hive
Closing: 0: jdbc:hive2://localhost:10000

$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
Connecting to jdbc:hive2://localhost:10000
Connected to: Spark SQL (version 2.1.0-SNAPSHOT)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
+-------------------------+--+
| (IF(true, NULL, NULL))  |
+-------------------------+--+
| NULL                    |
+-------------------------+--+
1 row selected (0.201 seconds)
Beeline version 1.2.1.spark2 by Apache Hive
Closing: 0: jdbc:hive2://localhost:10000
```

## How was this patch tested?

* Pass the Jenkins test with a new testsuite.
* Also, Manually, after starting Spark Thrift Server, run the following command.
```sql
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null"
$ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)"
```

**Hive 1.2**
```sql
hive> create table null_table as select null;
hive> desc null_table;
OK
_c0                     void
```

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #15325 from dongjoon-hyun/SPARK-17112.

(cherry picked from commit c571cfb)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@dongjoon-hyun
Copy link
Member Author

Thank You so much for review and merging, @rxin .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-17112 branch November 7, 2016 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants