I rebooted by Ubuntu server today and after that, suddenly all Chinese characters on all my pads are ???.
It's really weird, since I didn't change any code or update any Node.js things. The only thing that I think could have changed are some Ubuntu packages being updated (possibly including Apache, which only works as a ProxyPass), as well as the content of some pads.
As an example of what it looks like: (I know the first word is 东方行).
http://ether.markv.nl/p/pleaserepairmeplixplox
== ??? ==
?????????
?????"???"?
??????????
????????????
(ascii characters are unaffected)
I ran this through four different character encoding detection programs and websites. All of them say it's utf8 (with 99% confidence where given), but none of them can turn it into anything remotely readable.
I've found quite a number of issues about character encoding and I followed suggestions like changing MySQL collation to utf8mb4_general_ci, but I think it was too late. I also updated to the latest version of etherpad-lite after the problem started.
I checked in the database , but only after changing the collation. It also appears as ??? there (now). If I type Chinese characters into a pad now they seem to stay intact (but who knows for how long).
I'm kind of giving up, having already spent a good number of hours trying to solve this. But I thought I'd post this issue, in case it helps improve the software, or anyone can tell me how to recover my data. I imagine the issue is hard to reproduce for solving.
I rebooted by Ubuntu server today and after that, suddenly all Chinese characters on all my pads are ???.
It's really weird, since I didn't change any code or update any Node.js things. The only thing that I think could have changed are some Ubuntu packages being updated (possibly including Apache, which only works as a ProxyPass), as well as the content of some pads.
As an example of what it looks like: (I know the first word is 东方行).
http://ether.markv.nl/p/pleaserepairmeplixplox
== ??? ==
?????????
?????"???"?
??????????
????????????
(ascii characters are unaffected)
I ran this through four different character encoding detection programs and websites. All of them say it's utf8 (with 99% confidence where given), but none of them can turn it into anything remotely readable.
I've found quite a number of issues about character encoding and I followed suggestions like changing MySQL collation to utf8mb4_general_ci, but I think it was too late. I also updated to the latest version of etherpad-lite after the problem started.
I checked in the database , but only after changing the collation. It also appears as ??? there (now). If I type Chinese characters into a pad now they seem to stay intact (but who knows for how long).
I'm kind of giving up, having already spent a good number of hours trying to solve this. But I thought I'd post this issue, in case it helps improve the software, or anyone can tell me how to recover my data. I imagine the issue is hard to reproduce for solving.