Skip to content

Conversation

@xalt7x
Copy link
Contributor

@xalt7x xalt7x commented May 21, 2024

When using substr() or another method to reduce a string to/by 1 byte, many UTF-8 characters are lost (displayed as � ). Switching to mb_substr() fixes this.

When using substr() or another method to reduce a string to/by 1 byte,
many UTF-8 characters are lost (displayed as � ). Switching to mb_substr() fixes this.
@xalt7x
Copy link
Contributor Author

xalt7x commented May 21, 2024

The problem is easily reproducible with Cyrillic/Ukrainian characters (e.g., "Джон Дое" as the User/Owner name, or "Навички обслуговування клієнтів" string for "Key Skills").

fix_cyrillic_abbreviation

Additional information:

If you’re working with strings encoded as UTF-8 you may lose characters when you try to get a part of them using the PHP substr function. This happens because in UTF-8 characters are not restricted to one byte, they have variable length to match Unicode characters, between 1 and 4 bytes.

@RussH
Copy link
Member

RussH commented Sep 16, 2024

Thanks @xalt7x !

@RussH RussH merged commit e7c1ab1 into opencats:master Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants