Skip to content

(Maybe) Rare multithreading + socket reuse race condition causing crash #754

@radkesvat

Description

@radkesvat

Hi, first of all, thank you for this excellent project.

I recently encountered a rare multithreading issue in my project using libhv as its event loop, and I would like to share the scenario to confirm if it is a bug or a misunderstanding on my side.


Context:

In my project, I removed the mutex for hio_t (though I believe this is unrelated to the issue).

The scenario is:

  • We have two threads:

    • Main thread: creates a TCP server on port 443.
    • Worker thread: sockets accepted on the main thread are distributed to either the main thread or the worker thread using hio_detach / hio_attach.
  • When a thread gets an accepted socket, it also initiates a connection to an external TCP address (e.g., google.com:443).


Example state:

At a given moment we have these sockets:

  • s1: accepted socket on thread 0 (main).
  • s2: connection to google on thread 0 (main).
  • s3: accepted from thread 0 but moved to thread 1 (worker).
  • s4: connection to google on thread 1 (worker).

note that i didnt mention the socket that is accepting connections on main thread


The problematic event order:

Assume epoll on the main thread returns three events in this order:

  1. Data ready to read on s2.
  2. Data ready to read on the server accept FD (a new connection is ready to be accepted).
  3. Data ready to read on s1.

During processing of event 1, reading the data causes the logic to close its paired socket (s1).

Then during event 2, accept may return a new socket FD that reuses the same FD number as s1 (since it was just closed).

If the main thread assigns this newly accepted socket to thread 1 and then proceeds to event 3, it attempts to read from the hio_t associated with the old s1 (which is now invalid for the main thread’s event loop). The loop->ios.ptr[fd] is NULL, but the pointer used during pending event processing is still valid and is now incorrectly transferred to thread 1, since hio_t is being reused


My current workaround:

Before actually closing the socket FD in hio_close, I check if there are pending events for the hio_t. If so, I queue a message to call close on the next iteration rather than closing immediately, ensuring no pending events are associated with the hio_t during closure.


If this analysis is correct and the issue is indeed present, I would be happy to prepare a pull request with my current fix, or adjust it if you believe there is a cleaner solution.

Thank you for your time and for maintaining libhv!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions