-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-17112][SQL] "select null" via JDBC triggers IllegalArgumentException in Thriftserver #15325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…eption in Thriftserver
|
Test build #66238 has finished for PR 15325 at commit
|
|
We need to add a test case. We don't necessarily need an end-to-end test, but we can refactor this schema output function slightly to make it testable. |
|
Thank you for review, @rxin . I see. I'll find some place for a testcase about schema output function. |
|
Hi, @rxin . |
| import org.apache.spark.sql.SparkSession | ||
|
|
||
| class SparkExecuteStatementOperationSuite | ||
| extends SparkFunSuite with BeforeAndAfterAll with Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need logging here do you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Right. I'll remove that.
| } | ||
|
|
||
| object SparkExecuteStatementOperation extends Logging { | ||
| def getTableSchema(result: DataFrame): TableSchema = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this just take a StructType and return a TableSchema? Then you don't need any of the SparkSession setup in test suites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean using result.queryExecution.analyzed.schema instead of result.queryExecution.analyzed.output? I see. I'll update the scope of refactored function and testcase. Although it changes the previous logic, that will make the test suite the simplest. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I use reult.queryExecution.analyzed.output instead? I mean Seq[Attribute] instead of StructType.
|
Test build #66253 has finished for PR 15325 at commit
|
|
Thank you, @rxin . The testsuite is simplified much. |
| } | ||
|
|
||
| object SparkExecuteStatementOperation extends Logging { | ||
| def getTableSchema(output: Seq[Attribute]): TableSchema = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant using StructType, since you can get that from result.schema. Then this is built entirely on public APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Sorry, I just thought it's an example to remove SparkSession stuff. I'll fix that again. Thank you for fast reply.
|
Test build #66255 has finished for PR 15325 at commit
|
|
I've overlooked the dependency between module and used |
|
Test build #66256 has finished for PR 15325 at commit
|
|
|
||
| private lazy val resultSchema: TableSchema = { | ||
| if (result == null || result.queryExecution.analyzed.output.size == 0) { | ||
| if (result == null || result.queryExecution.analyzed.output.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while you are at this, can we change this to result.schema, rather than "result.queryExecution.analyzed.output"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. That's much better. I fixed those to use result.schema as you mentioned. Especially, logInfo(..), it changed to print the real result schema instead of attribute.
- Before:
Result Schema: List(1#5, a#6) - After:
Result Schema: StructType(StructField(1,IntegerType,false), StructField(a,StringType,false))
|
Test build #66266 has finished for PR 15325 at commit
|
|
Thanks - merging in master/2.0. |
…eption in Thriftserver ## What changes were proposed in this pull request? Currently, Spark Thrift Server raises `IllegalArgumentException` for queries whose column types are `NullType`, e.g., `SELECT null` or `SELECT if(true,null,null)`. This PR fixes that by returning `void` like Hive 1.2. **Before** ```sql $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0) Closing: 0: jdbc:hive2://localhost:10000 $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ Error: java.lang.IllegalArgumentException: Unrecognized type name: null (state=,code=0) Closing: 0: jdbc:hive2://localhost:10000 ``` **After** ```sql $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ +-------+--+ | NULL | +-------+--+ | NULL | +-------+--+ 1 row selected (3.242 seconds) Beeline version 1.2.1.spark2 by Apache Hive Closing: 0: jdbc:hive2://localhost:10000 $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)" Connecting to jdbc:hive2://localhost:10000 Connected to: Spark SQL (version 2.1.0-SNAPSHOT) Driver: Hive JDBC (version 1.2.1.spark2) Transaction isolation: TRANSACTION_REPEATABLE_READ +-------------------------+--+ | (IF(true, NULL, NULL)) | +-------------------------+--+ | NULL | +-------------------------+--+ 1 row selected (0.201 seconds) Beeline version 1.2.1.spark2 by Apache Hive Closing: 0: jdbc:hive2://localhost:10000 ``` ## How was this patch tested? * Pass the Jenkins test with a new testsuite. * Also, Manually, after starting Spark Thrift Server, run the following command. ```sql $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select null" $ bin/beeline -u jdbc:hive2://localhost:10000 -e "select if(true,null,null)" ``` **Hive 1.2** ```sql hive> create table null_table as select null; hive> desc null_table; OK _c0 void ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #15325 from dongjoon-hyun/SPARK-17112. (cherry picked from commit c571cfb) Signed-off-by: Reynold Xin <rxin@databricks.com>
|
Thank You so much for review and merging, @rxin . |
What changes were proposed in this pull request?
Currently, Spark Thrift Server raises
IllegalArgumentExceptionfor queries whose column types areNullType, e.g.,SELECT nullorSELECT if(true,null,null). This PR fixes that by returningvoidlike Hive 1.2.Before
After
How was this patch tested?
Hive 1.2