-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-25040][SQL] Empty string for non string types should be disallowed #22787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @HyukjinKwon |
|
Yea looks good as we discussed. Should we maybe better update the migration guide too while we are here? |
|
Yea, I do think so. I will update it. |
|
Test build #97684 has finished for PR 22787 at commit
|
|
Test build #97685 has finished for PR 22787 at commit
|
docs/sql-migration-guide-upgrade.md
Outdated
|
|
||
| - In PySpark, when creating a `SparkSession` with `SparkSession.builder.getOrCreate()`, if there is an existing `SparkContext`, the builder was trying to update the `SparkConf` of the existing `SparkContext` with configurations specified to the builder, but the `SparkContext` is shared by all `SparkSession`s, so we should not update them. Since 3.0, the builder comes to not update the configurations. This is the same behavior as Java/Scala API in 2.3 and above. If you want to update them, you need to update them prior to creating a `SparkSession`. | ||
|
|
||
| - In Spark version 2.4 and earlier, the parser of JSON data source treats empty strings as null for some data types like `IntegerType`. For `FloatType` and `DoubleType`, it fails on empty strings and throws exceptions. Since Spark 3.0, we disallow empty strings and will throw exceptions for data types except for `StringType` and `BinaryType`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personal preference .. : some data types such as IntegerType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
|
github looks buggy for now. Let me clean up my comments if they got messed. |
|
Test build #97690 has finished for PR 22787 at commit
|
|
Test build #97692 has finished for PR 22787 at commit
|
|
Test build #97695 has finished for PR 22787 at commit
|
|
Test build #97688 has finished for PR 22787 at commit
|
|
Test build #97762 has finished for PR 22787 at commit
|
|
Test build #97757 has finished for PR 22787 at commit
|
|
Test build #97832 has started for PR 22787 at commit |
|
Test build #97769 has finished for PR 22787 at commit
|
|
Test build #97843 has started for PR 22787 at commit |
|
Test build #97837 has finished for PR 22787 at commit
|
|
Test build #97775 has finished for PR 22787 at commit
|
|
Test build #97852 has started for PR 22787 at commit |
|
Test build #97858 has started for PR 22787 at commit |
|
It's chaotic ... |
|
Test build #97864 has started for PR 22787 at commit |
|
Test build #97869 has finished for PR 22787 at commit
|
|
Thanks @HyukjinKwon.
I've fixed that and committed. But looks like Github is out of order now.
The commit is now disappeared. Let's wait for Github backing to normal. :)
…On Mon, Oct 22, 2018 at 12:50 PM Hyukjin Kwon ***@***.***> wrote:
***@***.**** approved this pull request.
Yea, looks we should fix the comments.
LGTM otherwise
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#22787 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEM9_r0kwKh8oV3uNoOLB59LU_Kwtgnks5unU6KgaJpZM4Xycnv>
.
|
|
retest this please. |
|
Seems github is restored... |
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
cc @cloud-fan |
|
Test build #97884 has finished for PR 22787 at commit
|
|
Merged to master. |
|
Thanks @HyukjinKwon @dongjoon-hyun |
…owed ## What changes were proposed in this pull request? This takes over original PR at apache#22019. The original proposal is to have null for float and double types. Later a more reasonable proposal is to disallow empty strings. This patch adds logic to throw exception when finding empty strings for non string types. ## How was this patch tested? Added test. Closes apache#22787 from viirya/SPARK-25040. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>
|
can we add a legacy config for it? |
|
@cloud-fan Ok. I can add a legacy config for this. |
|
@viirya thanks! |
…ings for certain types in json parser ### What changes were proposed in this pull request? This is a follow-up for #22787. In #22787 we disallowed empty strings for json parser except for string and binary types. This follow-up adds a legacy config for restoring previous behavior of allowing empty string. ### Why are the changes needed? Adding a legacy config to make migration easy for Spark users. ### Does this PR introduce any user-facing change? Yes. If set this legacy config to true, the users can restore previous behavior prior to Spark 3.0.0. ### How was this patch tested? Unit test. Closes #27456 from viirya/SPARK-25040-followup. Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
…ings for certain types in json parser ### What changes were proposed in this pull request? This is a follow-up for #22787. In #22787 we disallowed empty strings for json parser except for string and binary types. This follow-up adds a legacy config for restoring previous behavior of allowing empty string. ### Why are the changes needed? Adding a legacy config to make migration easy for Spark users. ### Does this PR introduce any user-facing change? Yes. If set this legacy config to true, the users can restore previous behavior prior to Spark 3.0.0. ### How was this patch tested? Unit test. Closes #27456 from viirya/SPARK-25040-followup. Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Co-authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 7631275) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This takes over original PR at #22019. The original proposal is to have null for float and double types. Later a more reasonable proposal is to disallow empty strings. This patch adds logic to throw exception when finding empty strings for non string types.
How was this patch tested?
Added test.