-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-30509][SQL] Fix deprecation log warning in Avro schema inferring #27200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gengliangwang @HyukjinKwon Please, review the PR. |
|
Test build #116701 has started for PR 27200 at commit |
| .read | ||
| .format("avro") | ||
| .option(AvroOptions.ignoreExtensionKey, false) | ||
| .option("header", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Avro has header option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copy pasted the piece of code from another test in AvroSuite:
| .option("header", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh, it is not needed. It seems the test which I copy-pasted was copy-pasted from another places. I guess from
spark/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
Lines 746 to 787 in b389b8c
| test("File source v2: support partition pruning") { | |
| withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "") { | |
| allFileBasedDataSources.foreach { format => | |
| withTempPath { dir => | |
| Seq(("a", 1, 2), ("b", 1, 2), ("c", 2, 1)) | |
| .toDF("value", "p1", "p2") | |
| .write | |
| .format(format) | |
| .partitionBy("p1", "p2") | |
| .option("header", true) | |
| .save(dir.getCanonicalPath) | |
| val df = spark | |
| .read | |
| .format(format) | |
| .option("header", true) | |
| .load(dir.getCanonicalPath) | |
| .where("p1 = 1 and p2 = 2 and value != \"a\"") | |
| val filterCondition = df.queryExecution.optimizedPlan.collectFirst { | |
| case f: Filter => f.condition | |
| } | |
| assert(filterCondition.isDefined) | |
| // The partitions filters should be pushed down and no need to be reevaluated. | |
| assert(filterCondition.get.collectFirst { | |
| case a: AttributeReference if a.name == "p1" || a.name == "p2" => a | |
| }.isEmpty) | |
| val fileScan = df.queryExecution.executedPlan collectFirst { | |
| case BatchScanExec(_, f: FileScan) => f | |
| } | |
| assert(fileScan.nonEmpty) | |
| assert(fileScan.get.partitionFilters.nonEmpty) | |
| assert(fileScan.get.planInputPartitions().forall { partition => | |
| partition.asInstanceOf[FilePartition].files.forall { file => | |
| file.filePath.contains("p1=1") && file.filePath.contains("p2=2") | |
| } | |
| }) | |
| checkAnswer(df, Row("b", 1, 2)) | |
| } | |
| } | |
| } | |
| } |
Let me remove the option from other avro test as well.
| val deprecatedEvents = logAppender.loggingEvents | ||
| .filter(_.getRenderedMessage.contains( | ||
| s"Option ${AvroOptions.ignoreExtensionKey} is deprecated")) | ||
| assert(deprecatedEvents.size === 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test if the size is just bigger then 0 just in case we have other deprecation logs later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compared the size to 1 to avoid any concerns that it is printed multiple times like in the PR (#27174 (comment)), per each partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expect it is printed only once, maybe we should assert that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if somebody modifies the code in the future in the way the warning is printed many times, we will catch the situation by the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just in case we have other deprecation logs later?
As we discussed in another PR, we are not going to print any log warnings about ignoreExtension. Am I right or misunderstood something?
|
Test build #116702 has finished for PR 27200 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following will match only AvroOptions.ignoreExtensionKey. So, the warnings from another deprecations will be skipped. And, the assertion seems to verify that target warning occurs once.
val deprecatedEvents = logAppender.loggingEvents
.filter(_.getRenderedMessage.contains(...)| .write | ||
| .format("avro") | ||
| .partitionBy("p1", "p2") | ||
| .option("header", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, BTW, why do you piggy-back this removal into this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is small, since I am here I remove unneeded option. Do you want to see a separate PR for the little change?
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The misleading piggy-back removal is a blocker for this PR.
This reverts commit 6c53b50.
|
@dongjoon-hyun I removed unrelated changes from this PR |
|
Test build #116723 has finished for PR 27200 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM back. Merged to master.
What changes were proposed in this pull request?
In the PR, I propose to check the
ignoreExtensionKeyoption in the case insensitive map ofAvroOption.Why are the changes needed?
The map
optionspassed toAvroUtils.inferSchemacontains all keys in the lower cases in fact. Actually, the map is converted from aCaseInsensitiveStringMap. Consequently, the checkspark/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala
Line 45 in 3663dbe
false, and the deprecation log warning is never printed.Does this PR introduce any user-facing change?
Yes, after the changes the log warning is printed once.
How was this patch tested?
Added new test to
AvroSuitewhich checks existence of log warning.