-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32559][SQL] Fix the trim logic did't handle ASCII control characters correctly #41535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan @yaooqinn @gengliangwang @WangGuangxin Could you please help review this? |
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java
Outdated
Show resolved
Hide resolved
|
I'm not sure this is correct. SQL TRIM from other DBs seems to default to removing whitespace or the space char, not control codes. |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
Outdated
Show resolved
Hide resolved
yaooqinn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Spark is a general engine supporting various kinds of data formats and sources, IMO, it's necessary to trim controls that are very likely to be involved during data exchange.
So, LGTM.
…/DateTimeUtilsSuite.scala Co-authored-by: Kent Yao <yao@apache.org>
…acters correctly ### What changes were proposed in this pull request? The trim logic in Cast expression introduced in #29375 trim ASCII control characters unexpectly. Before this patch  And hive  ### Why are the changes needed? The behavior described above doesn't consistent with the behavior of Hive ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? add ut Closes #41535 from Kwafoor/trim_bugfix. Lead-authored-by: wangjunbo <wangjunbo@qiyi.com> Co-authored-by: Junbo wang <1042815068@qq.com> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit 80588e4) Signed-off-by: Kent Yao <yao@apache.org>
…acters correctly ### What changes were proposed in this pull request? The trim logic in Cast expression introduced in #29375 trim ASCII control characters unexpectly. Before this patch  And hive  ### Why are the changes needed? The behavior described above doesn't consistent with the behavior of Hive ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? add ut Closes #41535 from Kwafoor/trim_bugfix. Lead-authored-by: wangjunbo <wangjunbo@qiyi.com> Co-authored-by: Junbo wang <1042815068@qq.com> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit 80588e4) Signed-off-by: Kent Yao <yao@apache.org>
|
Thanks, merged to master and 3.4/3.3 |
…acters correctly ### What changes were proposed in this pull request? The trim logic in Cast expression introduced in apache#29375 trim ASCII control characters unexpectly. Before this patch  And hive  ### Why are the changes needed? The behavior described above doesn't consistent with the behavior of Hive ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? add ut Closes apache#41535 from Kwafoor/trim_bugfix. Lead-authored-by: wangjunbo <wangjunbo@qiyi.com> Co-authored-by: Junbo wang <1042815068@qq.com> Signed-off-by: Kent Yao <yao@apache.org>
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @Kwafoor , @yaooqinn and all.
SPARK-32559 is delivered as Apache Spark 3.0.1 patch.
We need a new JIRA issue because this will land at Apache Spark 3.5.0/3.4.1/3.3.3 newly.
|
@dongjoon-hyun Oops, My bad. For the current status, is it OK to link this pull request to a new jira? |
…acters correctly ### What changes were proposed in this pull request? The trim logic in Cast expression introduced in apache#29375 trim ASCII control characters unexpectly. Before this patch  And hive  ### Why are the changes needed? The behavior described above doesn't consistent with the behavior of Hive ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? add ut Closes apache#41535 from Kwafoor/trim_bugfix. Lead-authored-by: wangjunbo <wangjunbo@qiyi.com> Co-authored-by: Junbo wang <1042815068@qq.com> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit 80588e4) Signed-off-by: Kent Yao <yao@apache.org>
My mistake, should I raise a new issue to link this pr? |
|
@Kwafoor Yes |
I created an issue https://issues.apache.org/jira/browse/SPARK-44383, but how to link this pull request to a new jira? I saw that spark 3.4.1 was released, with this pr and the wrong issue name SPARK-32559.I’m very sorry about this. |
|
I have manually resolved and re-targeted the links. We probably need to push a new PR to the spark-website repo to fix the release note @Kwafoor |
I created PR-466 to the spark-website repo to fix the release note. |
|
thank you @Kwafoor |
…acters correctly ### What changes were proposed in this pull request? The trim logic in Cast expression introduced in apache#29375 trim ASCII control characters unexpectly. Before this patch  And hive  ### Why are the changes needed? The behavior described above doesn't consistent with the behavior of Hive ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? add ut Closes apache#41535 from Kwafoor/trim_bugfix. Lead-authored-by: wangjunbo <wangjunbo@qiyi.com> Co-authored-by: Junbo wang <1042815068@qq.com> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit 80588e4) Signed-off-by: Kent Yao <yao@apache.org>
…acters correctly ### What changes were proposed in this pull request? The trim logic in Cast expression introduced in apache#29375 trim ASCII control characters unexpectly. Before this patch  And hive  ### Why are the changes needed? The behavior described above doesn't consistent with the behavior of Hive ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? add ut Closes apache#41535 from Kwafoor/trim_bugfix. Lead-authored-by: wangjunbo <wangjunbo@qiyi.com> Co-authored-by: Junbo wang <1042815068@qq.com> Signed-off-by: Kent Yao <yao@apache.org> (cherry picked from commit 80588e4) Signed-off-by: Kent Yao <yao@apache.org>


What changes were proposed in this pull request?
The trim logic in Cast expression introduced in #29375 trim ASCII control characters unexpectly.
Before this patch


And hive
Why are the changes needed?
The behavior described above doesn't consistent with the behavior of Hive
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
add ut