Skip to content

Conversation

@Bilal2453
Copy link
Contributor

Consider the following minimal-reproduction code:

local ssocket = {
  -- will finish the write instantly
  write = function(self, chunk, callback)
    callback()
    return true
  end,
  
  -- irrelevant, just to make the example work
  is_closing = function()
    return true
  end 
}

local write = require("coro-channel").wrapWrite(ssocket)
write("") -- it is important to provide any value here, because a nil value will close the socket instead of attempting a write

This will produce the following error

ncaught exception:
cannot resume running coroutine
stack traceback:
	/home/bilal/deps/coro-channel.lua:16: in function 'assertResume'
	/home/bilal/deps/coro-channel.lua:126: in function 'callback'
	/home/bilal/test.lua:5: in function 'write'
	/home/bilal/deps/coro-channel.lua:148: in function 'write'
	/home/bilal/test.lua:17: in function 'fn'
	[string "bundle:deps/require.lua"]:309: in function 'require'
	[string "bundle:/main.lua"]:128: in function <[string "bundle:/main.lua"]:20>
stack traceback:
	[C]: in function 'error'
	/home/bilal/deps/coro-channel.lua:16: in function 'assertResume'
	/home/bilal/deps/coro-channel.lua:126: in function 'callback'
	/home/bilal/test.lua:5: in function 'write'
	/home/bilal/deps/coro-channel.lua:148: in function 'write'
	/home/bilal/test.lua:17: in function 'fn'
	[string "bundle:deps/require.lua"]:309: in function 'require'
	[string "bundle:/main.lua"]:128: in function <[string "bundle:/main.lua"]:20>
stack traceback:
	[C]: in function 'error'
	[string "bundle:/deps/utils.lua"]:42: in function 'assertResume'
	[string "bundle:/init.lua"]:53: in function <[string "bundle:/init.lua"]:48>
	[C]: in function 'xpcall'
	[string "bundle:/init.lua"]:48: in function 'fn'
	[string "bundle:deps/require.lua"]:309: in function <[string "bundle:deps/require.lua"]:266>

This happens because the write call at coro-channel.lua#L148 returns immediately, the callback produced by wait() is also called immediately, this callback will attempt to resume the current thread, but the coroutine.yield line has not yet been executed (since write finished instantly).

This minimal example is basically what happens when a socket is wrapped with secure-socket , which will attempt to optimize empty writes by returning immediately (in the flush function over here it will execute the callback and return early).

Which leads to https/wss requests made by coro-net crash when code such as require'coro-http'.request("POST", "https://google.com", {}, "") is ran (secure-socket sees that we are trying to write an empty chunk for the payload, so it decides to just return early and immediately since such a write will be ignored by libuv anyways).

The solution suggested in this PR is to basically keep track of when the thread has yielded, and if the wait callback is called before we reach yield, then we don't bother with resuming current thread and just return, similar to what we did with coro-split.

fixes #303.

@Bilal2453
Copy link
Contributor Author

Also added a test to catch those cases.

Worth noting some... annoyances, while writing that test I noticed if the writer returns with return true, "this value should be ignored" coro-channel will return that the write has failed (the return of write() will be false indicating a failure) but at the same time it won't attempt to close the handle (since success is true). One part of the code thinks that it succeeded and doesn't attempt to close the socket, and the other part of the code simply checks if the second return is presented to treat it as an error.

Also, a write() could return failure indicated by either nil or false, it may returns either of the values depending on if the failure was picked up with the callback callback("this is an error") or with the write return return false, "this is an error".

@Bilal2453
Copy link
Contributor Author

Bilal2453 commented Aug 11, 2024

As I expanded the tests I noticed another state-related bug, consider the following code:

local ssocket = {
  write = function(_, _, callback)
    callback('callback failed')
    return true
  end,
  
  is_closing = function()
    return false
  end 
}

local write = require("coro-channel").wrapWrite(ssocket)
assert(write(""))

The control flow of this goes something like this:

  1. wait is called which resets and prepares the states and upvalues.
  2. The wait() callback is instantly called.
  3. err is set to "callback failed" since the callback failed.
  4. write call now finishes, success is set to true (since the write call succeeded), err is set to nil.
  5. the writer returns success, even though this actually failed.

Usually, the err value will be set by the write call itself (over in success, err = write(...), then it is reused for the callback returned error, but since in this case the callback is executed first this is no more the case.

This is now fixed by separating the call error from the callback error and tracked by two separate states.

Copy link
Member

@truemedian truemedian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I dislike the state that this change introduces, which makes it significantly harder to reason about what is happening in the control flow through these functions.

However if this fixes the bug I'm fine with introducing it and writing it off for "clean up later".

@Bilal2453
Copy link
Contributor Author

Bilal2453 commented Sep 30, 2024 via email

@Bilal2453
Copy link
Contributor Author

@squeek502 should be ready for merge.

@squeek502 squeek502 merged commit 72e5615 into luvit:master Oct 9, 2024
@Bilal2453 Bilal2453 mentioned this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

coro-channel: resuming before having a chance to yield

3 participants