-
Notifications
You must be signed in to change notification settings - Fork 4
poll for TIOCINQ and TIOCOUTQ instead of POLLIN, bump repeats #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
blast-hardcheese
approved these changes
Oct 28, 2024
blast-hardcheese
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v. nice
lhchavez
approved these changes
Oct 28, 2024
Contributor
jackyzha0
added a commit
that referenced
this pull request
Sep 29, 2025
turns out #51 reduced the race rate but did not eliminate it completely. we still see the data race bug very rarely and it is exacerbated on high-numbers of repeats in the tests and/or high nodejs event loop utilization my old theory was: - the user fd is O_NONBLOCK clear so any writes there should block until its made its way into the 4kb intermediary buffer - as a result, when the .wait returns, we know that the program output has at least made it fully into the the user fd input buffer - because we poll until `controller_inq == 0 && controller_outq == 0 && user_inq == 0 && user_outq == 0`, in theory we should never call the js side exit code until theres truly nothing left in the pipe (i.e. the readstream on the nodejs side has it in its own buffer) HOWEVER, with some logging i saw a few cases where in fact `controller_inq == 0 && controller_outq == 0 && user_inq == 0 && user_outq == 0`, the program had exited, yet the nodejs side still missed some data what i think is _actually_ happening is that 1. one of libuv's io threads reads the data from the controller side and queues a data event on the stream 2. polling exits as now theres no more data in the queues we immediately `drop(user_fd);` which [sets TTY_OTHER_CLOSED synchronously](https://github.com/torvalds/linux/blob/4ff71af020ae59ae2d83b174646fc2ad9fcd4dc4/drivers/tty/pty.c#L66) and we get an error event queued on the stream 3. if the nodejs event loop happens to read error before data, we emit 'end' and mark the data as read even though technically nodejs hasnt processed the data event yet so we drop data :( how we fix it: 1. axe poll_pty_fds_until_read 2. make Pty struct own user_fd and expose a method for the js side to drop this fd when its done 3. when child.wait finishes, write a synthetic 'EOF' that is actually a custom OSC terminal control sequence (`\x1B]7878\x1B\\`, 7878 is RUST on the phonepad :)) to the user fd (a cursory search shows no results, it seems _very_ unlikely for this sequence to appear randomly) 4. on the nodejs wrapper side, create a transform stream that parses out the synthetic EOF and emits it as a custom event when it happens 5. when the nodejs side hits this EOF, we know we are actually at the end of the data stream and can safely drop user_fd node-pty had the [same problem](microsoft/node-pty#72) and did the :grug: brain thing and [added a wait 250ms](microsoft/vscode@9464b54 ) so im calling it slightly more ok
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.

we noticed a few cases where if a program exited really quickly but dumped a lot of output (e.g.
envin the replit shell) we could sometimes miss outputwe used to poll the controller side fd for POLLIN but that actually isn't fully correct.
the tty pipes look something like:
we can sometimes enter cases where the controller side thinks it has no more data to give to the controlling program (nodejs when using @replit/ruspty) but the kernel has decided to block on passing some user fd data to the controller fd (on linux for example, the pipe in the user fd -> controller fd direction has about 4kb of capacity)
for example, if node isnt processing data events quickly enough, the controller-side queue can fill up and the user fd write will block until it has space again, we could rarely enter a race if nodejs decides to read an entire 4kb block completely emptying the controller fd queue and then the POLLIN logic could return with no more data left (even though the user side still has a pending write!)
this would drop data :(
a few changes: