Skip to content

Commit e2722b8

Browse files
sarutakdongjoon-hyun
authored andcommitted
[SPARK-54625][SQL] UTF8String#reverse should check offset and length on copying
### What changes were proposed in this pull request? This PR aims to check offset and length on copying in `UTF8String#reverse`. For details, see https://lists.apache.org/thread/d9pvkh3jbsq8lc33v75kmwq5wg57422h (Only PMC members can read with login). To avoid performance regression, this PR choose to check offset and length rather than validate the input UTF-8 string. ### Why are the changes needed? For safety. ### Does this PR introduce _any_ user-facing change? Yes, but doesn't break compatibility. ### How was this patch tested? Example queries mentioned in [this thread](https://lists.apache.org/thread/d9pvkh3jbsq8lc33v75kmwq5wg57422h) works even though the results are broken. All the operation defined in `UTF8String` are expected to work correctly with valid UTF-8 strings so the broken results with invalid UTF-8 strings should be reasonable. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #53366 from sarutak/fix-utf8-reverse. Authored-by: Kousuke Saruta <sarutak@amazon.co.jp> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent bbbad56 commit e2722b8

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1160,9 +1160,10 @@ public UTF8String reverse() {
11601160

11611161
int i = 0; // position in byte
11621162
while (i < numBytes) {
1163-
int len = numBytesForFirstByte(getByte(i));
1163+
int len = Math.min(numBytesForFirstByte(getByte(i)), numBytes);
1164+
int targetOffset = Math.max(result.length - i - len, 0);
11641165
copyMemory(this.base, this.offset + i, result,
1165-
BYTE_ARRAY_OFFSET + result.length - i - len, len);
1166+
BYTE_ARRAY_OFFSET + targetOffset, len);
11661167

11671168
i += len;
11681169
}

0 commit comments

Comments
 (0)