Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jan 14, 2020

What changes were proposed in this pull request?

In the PR, I propose to check the ignoreExtensionKey option in the case insensitive map of AvroOption.

Why are the changes needed?

The map options passed to AvroUtils.inferSchema contains all keys in the lower cases in fact. Actually, the map is converted from a CaseInsensitiveStringMap. Consequently, the check

if (options.contains("ignoreExtension")) {
always return false, and the deprecation log warning is never printed.

Does this PR introduce any user-facing change?

Yes, after the changes the log warning is printed once.

How was this patch tested?

Added new test to AvroSuite which checks existence of log warning.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 14, 2020

@gengliangwang @HyukjinKwon Please, review the PR.

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116701 has started for PR 27200 at commit 0bb33aa.

.read
.format("avro")
.option(AvroOptions.ignoreExtensionKey, false)
.option("header", true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Avro has header option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copy pasted the piece of code from another test in AvroSuite:

. Let me check and remove it if is not needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, it is not needed. It seems the test which I copy-pasted was copy-pasted from another places. I guess from

test("File source v2: support partition pruning") {
withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "") {
allFileBasedDataSources.foreach { format =>
withTempPath { dir =>
Seq(("a", 1, 2), ("b", 1, 2), ("c", 2, 1))
.toDF("value", "p1", "p2")
.write
.format(format)
.partitionBy("p1", "p2")
.option("header", true)
.save(dir.getCanonicalPath)
val df = spark
.read
.format(format)
.option("header", true)
.load(dir.getCanonicalPath)
.where("p1 = 1 and p2 = 2 and value != \"a\"")
val filterCondition = df.queryExecution.optimizedPlan.collectFirst {
case f: Filter => f.condition
}
assert(filterCondition.isDefined)
// The partitions filters should be pushed down and no need to be reevaluated.
assert(filterCondition.get.collectFirst {
case a: AttributeReference if a.name == "p1" || a.name == "p2" => a
}.isEmpty)
val fileScan = df.queryExecution.executedPlan collectFirst {
case BatchScanExec(_, f: FileScan) => f
}
assert(fileScan.nonEmpty)
assert(fileScan.get.partitionFilters.nonEmpty)
assert(fileScan.get.planInputPartitions().forall { partition =>
partition.asInstanceOf[FilePartition].files.forall { file =>
file.filePath.contains("p1=1") && file.filePath.contains("p2=2")
}
})
checkAnswer(df, Row("b", 1, 2))
}
}
}
}

Let me remove the option from other avro test as well.

val deprecatedEvents = logAppender.loggingEvents
.filter(_.getRenderedMessage.contains(
s"Option ${AvroOptions.ignoreExtensionKey} is deprecated"))
assert(deprecatedEvents.size === 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test if the size is just bigger then 0 just in case we have other deprecation logs later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared the size to 1 to avoid any concerns that it is printed multiple times like in the PR (#27174 (comment)), per each partition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we expect it is printed only once, maybe we should assert that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if somebody modifies the code in the future in the way the warning is printed many times, we will catch the situation by the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just in case we have other deprecation logs later?

As we discussed in another PR, we are not going to print any log warnings about ignoreExtension. Am I right or misunderstood something?

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116702 has finished for PR 27200 at commit 6c53b50.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following will match only AvroOptions.ignoreExtensionKey. So, the warnings from another deprecations will be skipped. And, the assertion seems to verify that target warning occurs once.

val deprecatedEvents = logAppender.loggingEvents
         .filter(_.getRenderedMessage.contains(...)

.write
.format("avro")
.partitionBy("p1", "p2")
.option("header", true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, BTW, why do you piggy-back this removal into this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is small, since I am here I remove unneeded option. Do you want to see a separate PR for the little change?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The misleading piggy-back removal is a blocker for this PR.

@MaxGekk
Copy link
Member Author

MaxGekk commented Jan 14, 2020

@dongjoon-hyun I removed unrelated changes from this PR

@SparkQA
Copy link

SparkQA commented Jan 14, 2020

Test build #116723 has finished for PR 27200 at commit 164cf5b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM back. Merged to master.

@MaxGekk MaxGekk deleted the avro-fix-ignoreExtension-contains branch June 5, 2020 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants