-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Hi, first of all, thank you for this excellent project.
I recently encountered a rare multithreading issue in my project using libhv as its event loop, and I would like to share the scenario to confirm if it is a bug or a misunderstanding on my side.
Context:
In my project, I removed the mutex for hio_t (though I believe this is unrelated to the issue).
The scenario is:
-
We have two threads:
- Main thread: creates a TCP server on port 443.
- Worker thread: sockets accepted on the main thread are distributed to either the main thread or the worker thread using
hio_detach/hio_attach.
-
When a thread gets an accepted socket, it also initiates a connection to an external TCP address (e.g.,
google.com:443).
Example state:
At a given moment we have these sockets:
s1: accepted socket on thread 0 (main).s2: connection to google on thread 0 (main).s3: accepted from thread 0 but moved to thread 1 (worker).s4: connection to google on thread 1 (worker).
note that i didnt mention the socket that is accepting connections on main thread
The problematic event order:
Assume epoll on the main thread returns three events in this order:
- Data ready to read on
s2. - Data ready to read on the server accept FD (a new connection is ready to be accepted).
- Data ready to read on
s1.
During processing of event 1, reading the data causes the logic to close its paired socket (s1).
Then during event 2, accept may return a new socket FD that reuses the same FD number as s1 (since it was just closed).
If the main thread assigns this newly accepted socket to thread 1 and then proceeds to event 3, it attempts to read from the hio_t associated with the old s1 (which is now invalid for the main thread’s event loop). The loop->ios.ptr[fd] is NULL, but the pointer used during pending event processing is still valid and is now incorrectly transferred to thread 1, since hio_t is being reused
My current workaround:
Before actually closing the socket FD in hio_close, I check if there are pending events for the hio_t. If so, I queue a message to call close on the next iteration rather than closing immediately, ensuring no pending events are associated with the hio_t during closure.
If this analysis is correct and the issue is indeed present, I would be happy to prepare a pull request with my current fix, or adjust it if you believe there is a cleaner solution.
Thank you for your time and for maintaining libhv!