Commit e2722b8
[SPARK-54625][SQL] UTF8String#reverse should check offset and length on copying
### What changes were proposed in this pull request?
This PR aims to check offset and length on copying in `UTF8String#reverse`.
For details, see https://lists.apache.org/thread/d9pvkh3jbsq8lc33v75kmwq5wg57422h (Only PMC members can read with login).
To avoid performance regression, this PR choose to check offset and length rather than validate the input UTF-8 string.
### Why are the changes needed?
For safety.
### Does this PR introduce _any_ user-facing change?
Yes, but doesn't break compatibility.
### How was this patch tested?
Example queries mentioned in [this thread](https://lists.apache.org/thread/d9pvkh3jbsq8lc33v75kmwq5wg57422h) works even though the results are broken.
All the operation defined in `UTF8String` are expected to work correctly with valid UTF-8 strings so the broken results with invalid UTF-8 strings should be reasonable.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #53366 from sarutak/fix-utf8-reverse.
Authored-by: Kousuke Saruta <sarutak@amazon.co.jp>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>1 parent bbbad56 commit e2722b8
File tree
1 file changed
+3
-2
lines changed- common/unsafe/src/main/java/org/apache/spark/unsafe/types
1 file changed
+3
-2
lines changedLines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1160 | 1160 | | |
1161 | 1161 | | |
1162 | 1162 | | |
1163 | | - | |
| 1163 | + | |
| 1164 | + | |
1164 | 1165 | | |
1165 | | - | |
| 1166 | + | |
1166 | 1167 | | |
1167 | 1168 | | |
1168 | 1169 | | |
| |||
0 commit comments