[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables #15495

dilipbiswal · 2016-10-14T22:48:06Z

What changes were proposed in this pull request?

Reopens the closed PR #15190
(Please refer to the above link for review comments on the PR)

Make sure the hive.default.fileformat is used to when creating the storage format metadata.

Output

scala> spark.sql("SET hive.default.fileformat=orc")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.sql("CREATE TABLE tmp_default(id INT)")
res2: org.apache.spark.sql.DataFrame = []

Before

scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]

After

scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println)
..
[# Storage Information,,]
[SerDe Library:,org.apache.hadoop.hive.ql.io.orc.OrcSerde,]
[InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
[Compressed:,No,]
[Storage Desc Parameters:,,]
[  serialization.format,1,]

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Added new tests to HiveDDLCommandSuite, SQLQuerySuite

dilipbiswal · 2016-10-14T22:50:29Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

+          // This testcase verifies that setting `hive.default.fileformat` has no impact on
+          // the target table's fileformat in case of CTAS.
+          assert(sessionState.conf.defaultDataSourceName === "parquet")
+          checkRelation(tableName = table, isDataSourceTable = true, format = "parquet")


In Scala 2.10, we need to name all the fields when we use named parameters. This is the reason for build failure while using scala 2.10

Hmm, I think there is an error because in your mixing of named and positional arguments, the positional arguments are not the prefix of argument lists.

I.e., the compilation error:

checkRelation(table, isDataSourceTable = true, "parquet")

Should be fixed by:

checkRelation(table, isDataSourceTable = true, format = "parquet")

@viirya so what i have should work , right ? I have also named the first field for code readability even though its not strictly necessary to fix the compilation issue. I also had a question, do you know how to trigger a test for scala 2.10 ? I would like to run against 2.10 if possible. I have run it against my local env though.

As I know, we can't trigger it. Maybe @yhuai will know it? You can compile it with scala 2.10 locally to make sure it passes.

yeah, I think it is no problem you have all named arguments for them.

I think we cannot trigger scala 2.10 build for a pr.

SparkQA · 2016-10-15T01:02:24Z

Test build #66995 has finished for PR 15495 at commit a98e173.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-10-15T17:53:00Z

@yhuai Do you think it is good enough to merge? Thank you!

yhuai · 2016-10-17T21:28:22Z

@gatorsmile If this pr fixes the problem related to the build, I am fine to merge it.

gatorsmile · 2016-10-17T21:30:00Z

Thank you! Will do it soon.

gatorsmile · 2016-10-17T23:53:53Z

retest this please

SparkQA · 2016-10-18T02:06:51Z

Test build #67097 has finished for PR 15495 at commit a98e173.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-10-18T03:50:17Z

Merging to master! Thanks!

dilipbiswal · 2016-10-18T05:52:26Z

@gatorsmile @yhuai Many thanks !!

…eating Hive Serde Tables ## What changes were proposed in this pull request? Reopens the closed PR apache#15190 (Please refer to the above link for review comments on the PR) Make sure the hive.default.fileformat is used to when creating the storage format metadata. Output ``` SQL scala> spark.sql("SET hive.default.fileformat=orc") res1: org.apache.spark.sql.DataFrame = [key: string, value: string] scala> spark.sql("CREATE TABLE tmp_default(id INT)") res2: org.apache.spark.sql.DataFrame = [] ``` Before ```SQL scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println) .. [# Storage Information,,] [SerDe Library:,org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,] [InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,] [OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,] [Compressed:,No,] [Storage Desc Parameters:,,] [ serialization.format,1,] ``` After ```SQL scala> spark.sql("DESC FORMATTED tmp_default").collect.foreach(println) .. [# Storage Information,,] [SerDe Library:,org.apache.hadoop.hive.ql.io.orc.OrcSerde,] [InputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,] [OutputFormat:,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,] [Compressed:,No,] [Storage Desc Parameters:,,] [ serialization.format,1,] ``` ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Added new tests to HiveDDLCommandSuite, SQLQuerySuite Author: Dilip Biswal <dbiswal@us.ibm.com> Closes apache#15495 from dilipbiswal/orc2.

dilipbiswal added 5 commits October 14, 2016 14:57

[SPARK-17620] hive.default.fileformat=orc does not set OrcSerde

fe558c3

style

9953387

review

4830bfc

Add test

8c5042b

adress 2.10 build failure

a98e173

dilipbiswal commented Oct 14, 2016

View reviewed changes

dilipbiswal mentioned this pull request Oct 14, 2016

[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables #15190

Closed

asfgit closed this in 813ab5e Oct 18, 2016

[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables #15495

[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Creating Hive Serde Tables #15495

Uh oh!

Conversation

dilipbiswal commented Oct 14, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dilipbiswal Oct 14, 2016

Choose a reason for hiding this comment

Uh oh!

viirya Oct 15, 2016

Choose a reason for hiding this comment

Uh oh!

dilipbiswal Oct 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya Oct 15, 2016

Choose a reason for hiding this comment

Uh oh!

viirya Oct 16, 2016

Choose a reason for hiding this comment

Uh oh!

yhuai Oct 17, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 15, 2016

Uh oh!

gatorsmile commented Oct 15, 2016

Uh oh!

yhuai commented Oct 17, 2016

Uh oh!

gatorsmile commented Oct 17, 2016

Uh oh!

gatorsmile commented Oct 17, 2016

Uh oh!

SparkQA commented Oct 18, 2016

Uh oh!

gatorsmile commented Oct 18, 2016

Uh oh!

dilipbiswal commented Oct 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dilipbiswal commented Oct 14, 2016 •

edited

Loading

dilipbiswal Oct 15, 2016 •

edited

Loading