-
Notifications
You must be signed in to change notification settings - Fork 29k
[WIP][SPARK-25040][SQL] Empty string for double and float types should be nulls in JSON #22019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan, @viirya and @fuqiliang |
|
Looks few other types could potentially have this issue too. Let me fix them all here while I am here. |
|
Hm.. wait let me take a closer look. |
|
Empty string should be treated as null for all non string types? spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala Lines 282 to 285 in ef57fdd
|
I would exclude complex types. |
|
I found this: in branch-1.6 and was thinking if we actually should just disallow. Let me take a closer look and make some fixes here with some details soon. |
|
Test build #94342 has finished for PR 22019 at commit
|
|
What do you guys think about completely disallow empty strings in other types and target it 3.0.0? In theory, empty string is a string and I think strictly it's more correct to disallow them. Seems that's the original intention in the code. |
|
I agree with this proposal @HyukjinKwon. I think it is wrong to consider as a null an empty string. An empty string is not a valid value for an int/double/... So in case we have, we should fail I think. |
|
SGTM |
|
SGTM too. |
|
Let me reopen a PR and proceed this after 2.4.0 or the code freeze |
|
@viirya and @MaxGekk, are you busy? Do you mind if I ask to take over this? we will completely disallow empty strings in other types and target it 3.0.0. The changes wouldn't be too much and it requires to update the migration guide. I will be busy for a couple of weeks so I would appreciate it if you find some time to take over this. Otherwise, I will start to work on this after a couple of weeks. |
|
Probably I will not be able to look at this for the next a few weeks. |
|
@HyukjinKwon thanks for pinging me. Let me look at this and see if I can make a PR soon. |
…owed ## What changes were proposed in this pull request? This takes over original PR at #22019. The original proposal is to have null for float and double types. Later a more reasonable proposal is to disallow empty strings. This patch adds logic to throw exception when finding empty strings for non string types. ## How was this patch tested? Added test. Closes #22787 from viirya/SPARK-25040. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>
…owed ## What changes were proposed in this pull request? This takes over original PR at apache#22019. The original proposal is to have null for float and double types. Later a more reasonable proposal is to disallow empty strings. This patch adds logic to throw exception when finding empty strings for non string types. ## How was this patch tested? Added test. Closes apache#22787 from viirya/SPARK-25040. Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>
What changes were proposed in this pull request?
This PR proposes to treat empty strings for double and float types as
nullconsistently. Looks we mistakenly missed this corner case, which I guess is not that serious since this looks happened betwen 1.x and 2.x, and pretty corner case.For an easy reproducer, in case of double, the code below raises an error:
Unlike other types:
How was this patch tested?
Unit tests were added and manually tested.