Skip to content

Conversation

@sandeep-katta
Copy link
Contributor

@sandeep-katta sandeep-katta commented Sep 17, 2019

What changes were proposed in this pull request?

#DataSet
fruit,color,price,quantity
apple,red,1,3
banana,yellow,2,4
orange,orange,3,5
xxx

This PR aims to fix the below

scala> spark.conf.set("spark.sql.csv.parser.columnPruning.enabled", false)
scala> spark.read.option("header", "true").option("mode", "DROPMALFORMED").csv("fruit.csv").count
res1: Long = 4

This is caused by the issue SPARK-24645.
SPARK-24645 issue can also be solved by SPARK-25387

Why are the changes needed?

SPARK-24645 caused this regression, so reverted the code as it can also be solved by SPARK-25387

Does this PR introduce any user-facing change?

No,

How was this patch tested?

Added UT, and also tested the bug SPARK-24645

SPARK-24645 regression
image

@sandeep-katta sandeep-katta changed the title [SPARK-29101][Core] Fix count API for csv file when DROPMALFORMED is selected [SPARK-29101][Core] Fix count API for csv file when DROPMALFORMED mode is selected Sep 17, 2019
@dongjoon-hyun
Copy link
Member

ok to test

@dongjoon-hyun
Copy link
Member

Thank you for making a PR, @sandeep-katta .

@dongjoon-hyun
Copy link
Member

cc @HyukjinKwon

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29101][Core] Fix count API for csv file when DROPMALFORMED mode is selected [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected Sep 18, 2019
@dongjoon-hyun
Copy link
Member

BTW, @sandeep-katta . sql/core is [SQL] component.

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110831 has finished for PR 25820 at commit d546011.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110871 has finished for PR 25820 at commit 1078e8a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110870 has finished for PR 25820 at commit 05786cb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110867 has finished for PR 25820 at commit ab0c5a9.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110892 has finished for PR 25820 at commit f6a01ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 18, 2019

Test build #110898 has finished for PR 25820 at commit f2c25f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 18, 2019

Thank you, @sandeep-katta and @HyukjinKwon .
Since branch-2.4 is our LTS branch for 2.x, can we have this in branch-2.4, too?

@HyukjinKwon
Copy link
Member

I didn't check branch-2.4. @sandeep-katta can you check and open a bacport PR? I will review there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants