-
Notifications
You must be signed in to change notification settings - Fork 572
Closed
Labels
Description
How do you use Sentry?
Sentry Saas (sentry.io)
Version
1.39.2
Issue
The strip_string function isn't working properly.
Here we calculate the size of the string in bytes as length. But then when we actually determine that the string needs trimming, we trim length characters from the string instead of length bytes. We also then potentially report the wrong number in the metadata.
from sentry_sdk.utils import strip_string
strip_string("éê", 2) # == AnnotatedValue(value="éê", ...)Both é and ê are two-byte large, making the string "éê" 4 bytes long. Yet strip_string will not strip it to two bytes.
- It'll get encoded into bytes here.
- The size of the encoded version is 4, so
lengthwill be set to4. - This check will be
True, because4 > 2. - But when we actually try to trim here, we're trimming the string
"éê"to two (characters/code points), as opposed to the encoded bytes representation.
Solution
Probably something to the effect of
string.encode("utf-8")[: max_bytes - 3].decode("utf-8", errors="ignore")The [: max_bytes - 3] part might end up cutting a code point in two; .decode with errors="ignore" will ignore any malformed codepoints.