fix: coro-channel attempts to resume running thread #329

Bilal2453 · 2024-08-04T02:09:10Z

Consider the following minimal-reproduction code:

local ssocket = {
  -- will finish the write instantly
  write = function(self, chunk, callback)
    callback()
    return true
  end,
  
  -- irrelevant, just to make the example work
  is_closing = function()
    return true
  end 
}

local write = require("coro-channel").wrapWrite(ssocket)
write("") -- it is important to provide any value here, because a nil value will close the socket instead of attempting a write

This will produce the following error

ncaught exception:
cannot resume running coroutine
stack traceback:
	/home/bilal/deps/coro-channel.lua:16: in function 'assertResume'
	/home/bilal/deps/coro-channel.lua:126: in function 'callback'
	/home/bilal/test.lua:5: in function 'write'
	/home/bilal/deps/coro-channel.lua:148: in function 'write'
	/home/bilal/test.lua:17: in function 'fn'
	[string "bundle:deps/require.lua"]:309: in function 'require'
	[string "bundle:/main.lua"]:128: in function <[string "bundle:/main.lua"]:20>
stack traceback:
	[C]: in function 'error'
	/home/bilal/deps/coro-channel.lua:16: in function 'assertResume'
	/home/bilal/deps/coro-channel.lua:126: in function 'callback'
	/home/bilal/test.lua:5: in function 'write'
	/home/bilal/deps/coro-channel.lua:148: in function 'write'
	/home/bilal/test.lua:17: in function 'fn'
	[string "bundle:deps/require.lua"]:309: in function 'require'
	[string "bundle:/main.lua"]:128: in function <[string "bundle:/main.lua"]:20>
stack traceback:
	[C]: in function 'error'
	[string "bundle:/deps/utils.lua"]:42: in function 'assertResume'
	[string "bundle:/init.lua"]:53: in function <[string "bundle:/init.lua"]:48>
	[C]: in function 'xpcall'
	[string "bundle:/init.lua"]:48: in function 'fn'
	[string "bundle:deps/require.lua"]:309: in function <[string "bundle:deps/require.lua"]:266>

This happens because the write call at coro-channel.lua#L148 returns immediately, the callback produced by wait() is also called immediately, this callback will attempt to resume the current thread, but the coroutine.yield line has not yet been executed (since write finished instantly).

This minimal example is basically what happens when a socket is wrapped with secure-socket , which will attempt to optimize empty writes by returning immediately (in the flush function over here it will execute the callback and return early).

Which leads to https/wss requests made by coro-net crash when code such as require'coro-http'.request("POST", "https://google.com", {}, "") is ran (secure-socket sees that we are trying to write an empty chunk for the payload, so it decides to just return early and immediately since such a write will be ignored by libuv anyways).

The solution suggested in this PR is to basically keep track of when the thread has yielded, and if the wait callback is called before we reach yield, then we don't bother with resuming current thread and just return, similar to what we did with coro-split.

fixes #303.

Bilal2453 · 2024-08-04T02:52:00Z

Also added a test to catch those cases.

Worth noting some... annoyances, while writing that test I noticed if the writer returns with return true, "this value should be ignored" coro-channel will return that the write has failed (the return of write() will be false indicating a failure) but at the same time it won't attempt to close the handle (since success is true). One part of the code thinks that it succeeded and doesn't attempt to close the socket, and the other part of the code simply checks if the second return is presented to treat it as an error.

Also, a write() could return failure indicated by either nil or false, it may returns either of the values depending on if the failure was picked up with the callback callback("this is an error") or with the write return return false, "this is an error".

deps/coro-channel.lua

Bilal2453 · 2024-08-11T00:15:16Z

As I expanded the tests I noticed another state-related bug, consider the following code:

local ssocket = {
  write = function(_, _, callback)
    callback('callback failed')
    return true
  end,
  
  is_closing = function()
    return false
  end 
}

local write = require("coro-channel").wrapWrite(ssocket)
assert(write(""))

The control flow of this goes something like this:

wait is called which resets and prepares the states and upvalues.
The wait() callback is instantly called.
err is set to "callback failed" since the callback failed.
write call now finishes, success is set to true (since the write call succeeded), err is set to nil.
the writer returns success, even though this actually failed.

Usually, the err value will be set by the write call itself (over in success, err = write(...), then it is reused for the callback returned error, but since in this case the callback is executed first this is no more the case.

This is now fixed by separating the call error from the callback error and tracked by two separate states.

truemedian

In general I dislike the state that this change introduces, which makes it significantly harder to reason about what is happening in the control flow through these functions.

However if this fixes the bug I'm fine with introducing it and writing it off for "clean up later".

Bilal2453 · 2024-09-30T14:14:42Z

This likely won't be the last fix we introduce for a problem like this. I also hate the complexity of this flow and it proved annoying as I got it wrong the first 3 attempts. What should the flow look like ideally if you happen to have an idea?

…

On Mon, 30 Sept 2024, 16:39 truemedian, ***@***.***> wrote: ***@***.**** approved this pull request. In general I dislike the state that this change introduces, which makes it significantly harder to reason about what is happening in the control flow through these functions. However if this fixes the bug I'm fine with introducing it and writing it off for "clean up later". — Reply to this email directly, view it on GitHub <#329 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJDIIYGIZBFGFYZ3SKW2ETDZZFIARAVCNFSM6AAAAABL6MZ5LOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGMZXGU4TEMRWGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

Bilal2453 · 2024-10-09T14:56:09Z

@squeek502 should be ready for merge.

fix: coro-channel attempts to resume running thread

7cf161c

truemedian requested changes Aug 6, 2024

View reviewed changes

deps/coro-channel.lua Outdated Show resolved Hide resolved

fix corrupted states and expand tests

119627a

truemedian approved these changes Sep 30, 2024

View reviewed changes

squeek502 merged commit 72e5615 into luvit:master Oct 9, 2024

Bilal2453 mentioned this pull request Nov 20, 2024

HTTPS support #310

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: coro-channel attempts to resume running thread #329

fix: coro-channel attempts to resume running thread #329

Uh oh!

Bilal2453 commented Aug 4, 2024

Uh oh!

Bilal2453 commented Aug 4, 2024

Uh oh!

Uh oh!

Bilal2453 commented Aug 11, 2024 •

edited

Loading

Uh oh!

truemedian left a comment

Uh oh!

Bilal2453 commented Sep 30, 2024 via email

Uh oh!

Bilal2453 commented Oct 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: coro-channel attempts to resume running thread #329

fix: coro-channel attempts to resume running thread #329

Uh oh!

Conversation

Bilal2453 commented Aug 4, 2024

Uh oh!

Bilal2453 commented Aug 4, 2024

Uh oh!

Uh oh!

Bilal2453 commented Aug 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

truemedian left a comment

Choose a reason for hiding this comment

Uh oh!

Bilal2453 commented Sep 30, 2024 via email

Uh oh!

Bilal2453 commented Oct 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bilal2453 commented Aug 11, 2024 •

edited

Loading