-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-22300][BUILD] Update ORC to 1.4.1 #19521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #82853 has finished for PR 19521 at commit
|
|
Hi, @gatorsmile and @cloud-fan . This will remove the regression on on-going ORC PRs. Could you review this? |
|
looks good, no new dependencies introduced, just upgrading. cc @srowen to double check. Thanks! |
|
Thank you for review, @cloud-fan ! |
|
Also LGTM Regarding the test case you posted, does Parquet return |
|
We can save an empty DataFrame as an ORC table, but we are unable to fetch it from the table. val rddNoCols = sparkContext.parallelize(1 to 10).map(_ => Row.empty)
val dfNoCols = spark.createDataFrame(rddNoCols, StructType(Seq.empty))
dfNoCols.write.format("orc").saveAsTable("t")
spark.sql("select 1 from t").show()This is not related to this upgrade, but you might be interested in this. |
|
Thank you for review, @gatorsmile .
BTW, I've linked all related ORC issues into SPARK-20901 and am working on it. You can monitor ORC progress there. |
|
|
|
Oh, I confused with what I'm watching in these days. For your example, Parquet also doesn't support. We may create an issue for both Parquet/ORC on empty schema . scala> val rddNoCols = sparkContext.parallelize(1 to 10).map(_ => Row.empty)
scala> val dfNoCols = spark.createDataFrame(rddNoCols, StructType(Seq.empty))
scala> dfNoCols.write.format("parquet").saveAsTable("px")
17/10/18 05:46:17 ERROR Utils: Aborting task
org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: message spark_schema {
} |
|
LGTM too BTW. |
|
Empty schema path probably related with this IIRC (not double checked): spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala Lines 52 to 58 in cca945b
|
|
Thank you for review, @HyukjinKwon . |
|
LGTM |
|
Thank you, @rxin ! |
|
Thanks, merging to master! |
|
Thank you all for review and merge! |
What changes were proposed in this pull request?
Apache ORC 1.4.1 is released yesterday.
Like ORC-233 (Allow
orc.include.columnsto be empty), there are several important fixes.This PR updates Apache ORC dependency to use the latest one, 1.4.1.
How was this patch tested?
Pass the Jenkins.