Skip to content

Use wcwidth implementation from fish-shell#1289

Open
zhaofengli wants to merge 1 commit intomobile-shell:masterfrom
zhaofengli:fish-wcwidth
Open

Use wcwidth implementation from fish-shell#1289
zhaofengli wants to merge 1 commit intomobile-shell:masterfrom
zhaofengli:fish-wcwidth

Conversation

@zhaofengli
Copy link
Copy Markdown

This PR makes mosh use up-to-date wcwidth tables from fish-shell, so certain Unicode characters can be displayed correctly on platforms like macOS.

It revives #1143 with a simple wrapper. We now vendor an unmodified copy of widechar_width.h from https://github.com/ridiculousfish/widecharwidth.

Fixes #234.

cc @faho (#1143 (comment))

Co-authored-by: Stanislaw Pusep <creaktive@gmail.com>
@achernya
Copy link
Copy Markdown
Collaborator

I don't think we want to take this PR, as we have not thought about the issues around mosh-client and mosh-server having out of sync wcwidth implementations. That's already possible today, but it's only going to get worse if we vendor wcwidth as this PR suggests.

I'm in general super not into vendoring source.

@zhaofengli
Copy link
Copy Markdown
Author

Understood, thanks for the review. I'll apply the patch locally in the meantime.

[...] issues around mosh-client and mosh-server having out of sync wcwidth implementations

I agree that this is problematic. Currently it seems that the state diff is generated by the server as an terminal sequence (Display::new_frame). The client parses and applies the sequence, applies prediction overlays, then generates a new diff to be streamed to the actual terminal. There are two points where character width issues can occur:

  1. mosh-client applying the over-the-wire diff from mosh-server (uses libc wcwidth)
  2. The actual terminal rendering the diff generated from the client (we can't control what it does)

If my understanding is correct, would it make sense to change the over-the-wire diff representation to be in terms of Row/Cell changes? This way the server controls how characters are laid out in Cells, avoiding (1). Without the escape sequences, the diff can also be more compact.

I'm in general super not into vendoring source.

That's reasonable. We can require users and downstream packaging to provide widecharwidth.

@faho
Copy link
Copy Markdown

faho commented Aug 18, 2023

That's reasonable. We can require users and downstream packaging to provide widecharwidth.

With my widecharwidth hat on, since I was CC'd:

widecharwidth is meant to be used as a single header that you generate and include in your source. It is not set up to be built as a shared library, so it's not amenable to downstream packaging.

I don't believe we want to support a shared library either, because that means caring about downstream packages.

And the generated header is a few long data tables together with a couple functions to read them. Overall not a lot of code to vendor.

the issues around mosh-client and mosh-server having out of sync wcwidth implementations

I can tell you that we haven't had many reports of problems with being out-of-sync, at least not once you get over the big 8->9 hump. Tho that is shell (remote or local) and terminal being out of sync, y'all would have an additional step.


Anyway, it's of course your decision. I'll unsubscribe from this issue now so if anyone wants to summon me feel free to @ me.

jdrouhard added a commit to jdrouhard/mosh that referenced this pull request Aug 20, 2023
@SichangHe
Copy link
Copy Markdown

I don't think we want to take this PR, as we have not thought about the issues around mosh-client and mosh-server having out of sync wcwidth implementations. That's already possible today, but it's only going to get worse if we vendor wcwidth as this PR suggests.

I'm in general super not into vendoring source.

Vendoring is how you avoid the server and the client being out of sync.

Currently, you simply leave users with broken software. I, for example, had the display glitch in #1281, went through the trouble of researching the problem and compiling https://github.com/jdrouhard/mosh on both the server and the client, and then it works fine. Some other users would simply try mosh, find it broken, and never use it again.

This is improvement left on the table.

@brendandebeasi
Copy link
Copy Markdown

Coming to this from downstream — I maintain tabby, a tmux sidebar tool, and Mosh users hit this constantly: Unicode characters render with wrong widths causing off-by-one corruption.

A couple of questions:

  • Given that @faho noted widecharwidth is designed to be vendored as a header (not a shared lib), does that change the vendoring objection? It would also ensure client/server sync.
  • Is the Row/Cell wire protocol idea @zhaofengli raised being considered for 1.5.0, or is it out of scope?

Happy to help test on macOS/iPadOS if that's useful.

@achernya
Copy link
Copy Markdown
Collaborator

achernya commented Mar 2, 2026

The wire protocol idea would be a breaking change, as such it is not in scope for 1.5.0. maybe a hypothetical 2.0, if it even made sense. I haven't thought about the idea since it was proposed, and have since forgotten what my objection was, but I recall thinking there were some downsides with it that would make it challenging. (At the very least, it means client side prediction would get funky)

@brendandebeasi
Copy link
Copy Markdown

@achernya Do you have any thoughts on the up-to-date width tables making it into this release? Seems like there was some discourse on this earlier and I'm wondering how we can move forward

@achernya
Copy link
Copy Markdown
Collaborator

achernya commented Mar 2, 2026

I actively do not want vendor any width tables. If there is a library we can depend on that is packaged in every major distribution we care about, we can consider the dependency.

@brendandebeasi
Copy link
Copy Markdown

@achernya I get the no-vendored-width-tables stance. One option that seems to fit that constraint: use a system library for wcwidth instead of libc — libutf8proc (utf8proc_charwidth), which is already packaged in major distros and Homebrew.

For desync, we can keep mixed installs safe by warning if client/server disagree on backend or Unicode version, and if only one side has utf8proc, fall back to libc on both ends for that session so widths stay consistent.

This won't fix grapheme-cluster edge cases, but it should address the common macOS/old-libc emoji-width corruption without changing the wire protocol.

@achernya
Copy link
Copy Markdown
Collaborator

achernya commented Mar 3, 2026

I spot-checked the distros we directly push to (fedora, debian) and those we rely on for .pkg builds (macports). All of them have utf8proc. utf8proc includes pkg-config, so it should be straightforward to incorporate into configure.ac.

On the desync front, as I mentioned in this comment above, it's already possible today with libc. The primary thing I'd like us to figure out is if introducing utf8proc makes things noticeably worse. If the answer is "no", we can just integrate it directly and move on, no version checks or fallbacks necessary. If the answer is "yes", then we need to figure out how to do the signaling. There's no good way to do it; the underlying transport is per-packet, so it would be wasted overhead. Likewise, the hostinput/userinput sides aren't appropriate either, since it's just a byte stream.

That leaves us with the mosh startup sequence. I'm not excited about modifying mosh.pl but it looks like we could probably add another line there. But, again, I'd much rather we keep forwards and backwards compatibility as much as possible rather than explore this alternative.

If you or anyone else can think of any issues that don't exist with today's wcwidth implementation, that would be helpful in deciding the steps forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Display errors with certain characters

5 participants