Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Oct 21, 2018

What changes were proposed in this pull request?

This takes over original PR at #22019. The original proposal is to have null for float and double types. Later a more reasonable proposal is to disallow empty strings. This patch adds logic to throw exception when finding empty strings for non string types.

How was this patch tested?

Added test.

@viirya
Copy link
Member Author

viirya commented Oct 21, 2018

cc @HyukjinKwon

@HyukjinKwon
Copy link
Member

Yea looks good as we discussed. Should we maybe better update the migration guide too while we are here?

@viirya
Copy link
Member Author

viirya commented Oct 21, 2018

Yea, I do think so. I will update it.

@SparkQA
Copy link

SparkQA commented Oct 21, 2018

Test build #97684 has finished for PR 22787 at commit 45aacaf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 21, 2018

Test build #97685 has finished for PR 22787 at commit 589caf7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

HyukjinKwon
HyukjinKwon approved these changes Oct 22, 2018
HyukjinKwon
HyukjinKwon approved these changes Oct 22, 2018
HyukjinKwon
HyukjinKwon approved these changes Oct 22, 2018

- In PySpark, when creating a `SparkSession` with `SparkSession.builder.getOrCreate()`, if there is an existing `SparkContext`, the builder was trying to update the `SparkConf` of the existing `SparkContext` with configurations specified to the builder, but the `SparkContext` is shared by all `SparkSession`s, so we should not update them. Since 3.0, the builder comes to not update the configurations. This is the same behavior as Java/Scala API in 2.3 and above. If you want to update them, you need to update them prior to creating a `SparkSession`.

- In Spark version 2.4 and earlier, the parser of JSON data source treats empty strings as null for some data types like `IntegerType`. For `FloatType` and `DoubleType`, it fails on empty strings and throws exceptions. Since Spark 3.0, we disallow empty strings and will throw exceptions for data types except for `StringType` and `BinaryType`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference .. : some data types such as IntegerType.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

HyukjinKwon
HyukjinKwon approved these changes Oct 22, 2018
@HyukjinKwon
Copy link
Member

github looks buggy for now. Let me clean up my comments if they got messed.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97690 has finished for PR 22787 at commit 589caf7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97692 has finished for PR 22787 at commit bb117f2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97695 has finished for PR 22787 at commit 589caf7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97688 has finished for PR 22787 at commit bb117f2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97762 has finished for PR 22787 at commit 589caf7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97757 has finished for PR 22787 at commit bb117f2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97832 has started for PR 22787 at commit c04ea64.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97769 has finished for PR 22787 at commit bb117f2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97843 has started for PR 22787 at commit c04ea64.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97837 has finished for PR 22787 at commit bb117f2.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97775 has finished for PR 22787 at commit 589caf7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97852 has started for PR 22787 at commit bb117f2.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97858 has started for PR 22787 at commit c04ea64.

@HyukjinKwon
Copy link
Member

It's chaotic ...

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97864 has started for PR 22787 at commit bb117f2.

@SparkQA
Copy link

SparkQA commented Oct 22, 2018

Test build #97869 has finished for PR 22787 at commit c04ea64.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Oct 22, 2018 via email

@viirya
Copy link
Member Author

viirya commented Oct 23, 2018

retest this please.

@viirya
Copy link
Member Author

viirya commented Oct 23, 2018

Seems github is restored...

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@viirya
Copy link
Member Author

viirya commented Oct 23, 2018

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Oct 23, 2018

Test build #97884 has finished for PR 22787 at commit c04ea64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in 03e82e3 Oct 23, 2018
@viirya
Copy link
Member Author

viirya commented Oct 23, 2018

Thanks @HyukjinKwon @dongjoon-hyun

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…owed

## What changes were proposed in this pull request?

This takes over original PR at apache#22019. The original proposal is to have null for float and double types. Later a more reasonable proposal is to disallow empty strings. This patch adds logic to throw exception when finding empty strings for non string types.

## How was this patch tested?

Added test.

Closes apache#22787 from viirya/SPARK-25040.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
@cloud-fan
Copy link
Contributor

can we add a legacy config for it?

@viirya
Copy link
Member Author

viirya commented Feb 4, 2020

@cloud-fan Ok. I can add a legacy config for this.

@cloud-fan
Copy link
Contributor

@viirya thanks!

dongjoon-hyun pushed a commit that referenced this pull request Feb 5, 2020
…ings for certain types in json parser

### What changes were proposed in this pull request?

This is a follow-up for #22787. In #22787 we disallowed empty strings for json parser except for string and binary types. This follow-up adds a legacy config for restoring previous behavior of allowing empty string.

### Why are the changes needed?

Adding a legacy config to make migration easy for Spark users.

### Does this PR introduce any user-facing change?

Yes. If set this legacy config to true, the users can restore previous behavior prior to Spark 3.0.0.

### How was this patch tested?

Unit test.

Closes #27456 from viirya/SPARK-25040-followup.

Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
dongjoon-hyun pushed a commit that referenced this pull request Feb 5, 2020
…ings for certain types in json parser

### What changes were proposed in this pull request?

This is a follow-up for #22787. In #22787 we disallowed empty strings for json parser except for string and binary types. This follow-up adds a legacy config for restoring previous behavior of allowing empty string.

### Why are the changes needed?

Adding a legacy config to make migration easy for Spark users.

### Does this PR introduce any user-facing change?

Yes. If set this legacy config to true, the users can restore previous behavior prior to Spark 3.0.0.

### How was this patch tested?

Unit test.

Closes #27456 from viirya/SPARK-25040-followup.

Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com>
Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 7631275)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@viirya viirya deleted the SPARK-25040 branch December 27, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants