The specification for truncate says
Substring of length L
but does not define what it is counting. I assume the intention is for it to be Unicode code points, since the specification says that
Character strings must be stored as UTF-8 encoded byte arrays
However, the Java reference implementation uses java.lang.CharSequence#subSequence, thus the length is in terms of 16-bit code units, and thus is different for code points for characters outside of the Basic Multilingual Plane (BMP). Such code points require two characters, encoded using a high and low surrogate pair. Additionally, the truncation may happen in the middle of the surrogate pair, which is a form of corruption.
The specification for truncate says
but does not define what it is counting. I assume the intention is for it to be Unicode code points, since the specification says that
However, the Java reference implementation uses
java.lang.CharSequence#subSequence, thus the length is in terms of 16-bit code units, and thus is different for code points for characters outside of the Basic Multilingual Plane (BMP). Such code points require two characters, encoded using a high and low surrogate pair. Additionally, the truncation may happen in the middle of the surrogate pair, which is a form of corruption.