Skip to content

Conversation

@kpumuk
Copy link
Contributor

@kpumuk kpumuk commented Dec 3, 2025

In a race condition in Thrift::NonblockingServer, the following sequence could occur:

  1. server transport checked for closure (@server_transport.closed? returns false)
  2. another thread calls server.shutdown
  3. yet another thread, scheduled from #shutdown, calls #close on the server transport, which also sets @handle to nil
  4. back in the first thread, #to_io is called on the server transport, which now returns @handle as nil, causing an error:
  1) NonblockingServer Thrift::NonblockingServer should shut down when asked
     Failure/Error: rd, = select([@server_transport], nil, nil, 0.1)

     TypeError:
       can't convert SpecServerSocket to IO (SpecServerSocket#to_io gives NilClass)
     # ./lib/thrift/server/nonblocking_server.rb:48:in `select'
     # ./lib/thrift/server/nonblocking_server.rb:48:in `block in serve'
     # ./lib/thrift/server/nonblocking_server.rb:45:in `loop'
     # ./lib/thrift/server/nonblocking_server.rb:45:in `serve'
     # ./spec/nonblocking_server_spec.rb:116:in `block (4 levels) in <top (required)>'

This patch changes Thrift::ServerSocket#to_io to raise IOError if @handle is nil, which is the expected behavior for closed streams in Ruby:

  require 'socket'
  server = TCPServer.new('localhost', 55554)
  server.close
  select([server], nil, nil, 0.1)
  # => IOError: closed stream  

Before the change

sequenceDiagram
    participant T1 as Thread 1<br/>(NonblockingServer serve)
    participant T2 as Thread 2<br/>(server.shutdown)
    participant T3 as Thread 3<br/>(shutdown helper)
    participant S as ServerTransport<br/>(@server_transport)

    T1->>S: listen
    activate S

    loop accept socket
    Note over T1: 1. Check if server transport is closed
    T1->>S: closed?
    S-->>T1: returns false

    Note over T2: 2. Server is asked to shut down, spawns helper thread
    activate T2
    T2->>T3: &shutdown_proc
    deactivate T2
    activate T3

    Note over T3: 3. Closes server transport 
    T3->>S: close()
    S->>S: @handle = nil
    deactivate T3
    deactivate S

    Note over T1: 4. Back in serve loop<br/>select([@server_transport], nil, nil, 0.1)
    T1->>S: @server_transport.to_io<br/>(@handle is now nil)
    S-->>T1: returns nil

    Note over T1: 5. IO.select throws an error
    T1-->>T1: TypeError:<br/>"can't convert<br/>SpecServerSocket to IO<br/>(to_io gives NilClass)"
    end
Loading

After the change

sequenceDiagram
    participant T1 as Thread 1<br/>(NonblockingServer serve)
    participant T2 as Thread 2<br/>(server.shutdown)
    participant T3 as Thread 3<br/>(shutdown helper)
    participant S as ServerTransport<br/>(@server_transport)

    T1->>S: listen
    activate S

    loop accept socket
    Note over T1: 1. Check if server transport is closed
    T1->>S: closed?
    S-->>T1: returns false

    Note over T2: 2. Server is asked to shut down, spawns helper thread
    activate T2
    T2->>T3: &shutdown_proc
    deactivate T2
    activate T3

    Note over T3: 3. Closes server transport 
    T3->>S: close()
    S->>S: @handle = nil
    deactivate T3
    deactivate S

    Note over T1: 4. Back in serve loop<br/>select([@server_transport], nil, nil, 0.1)
    T1->>S: @server_transport.to_io<br/>(@handle is now nil)
    S-->>T1: throws IOError nil

    end

    Note over T1: 5. rescues IOError
    T1->>T1: shutting down
Loading
  • Did you create an Apache Jira ticket? THRIFT-5909
  • If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"?
  • Did you squash your changes to a single commit? (not required, but preferred)
  • Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"?
  • If your change does not involve any code, include [skip ci] anywhere in the commit message to free up build resources.

@kpumuk
Copy link
Contributor Author

kpumuk commented Dec 3, 2025

Build with all the remaining test fixes incorporated: https://github.com/kpumuk/thrift/actions/runs/19878193283

CleanShot 2025-12-02 at 20 12 38@2x

@Jens-G Jens-G added the ruby label Dec 10, 2025
In a race condition in Thrift::NonblockingServer, the following sequence could occur:

1. server transport checked for closure (@server_transport.closed?
   returns false)
2. another thread calls server.shutdown
3. yet another thread, scheduled from #shutdown, calls #close on the
   server transport, which also sets @handle to nil
4. back in the first thread, #to_io is called on the server transport,
   which now returns @handle as nil, causing an error:

  1) NonblockingServer Thrift::NonblockingServer should shut down when asked
     Failure/Error: rd, = select([@server_transport], nil, nil, 0.1)

     TypeError:
       can't convert SpecServerSocket to IO (SpecServerSocket#to_io gives NilClass)
     # ./lib/thrift/server/nonblocking_server.rb:48:in `select'
     # ./lib/thrift/server/nonblocking_server.rb:48:in `block in serve'
     # ./lib/thrift/server/nonblocking_server.rb:45:in `loop'
     # ./lib/thrift/server/nonblocking_server.rb:45:in `serve'
     # ./spec/nonblocking_server_spec.rb:116:in `block (4 levels) in <top (required)>'

This patch changes Thrift::ServerSocket#to_io to raise IOError if
@handle is nil, which is the expected behavior for closed streams in Ruby:

  require 'socket'
  server = TCPServer.new('localhost', 55554)
  server.close
  select([server], nil, nil, 0.1)
  # => IOError: closed stream
@kpumuk kpumuk force-pushed the nonblocking-io-error branch from 9e336fa to b32345c Compare December 14, 2025 21:20
@Jens-G Jens-G merged commit b8f7e5b into apache:master Dec 16, 2025
31 checks passed
@kpumuk kpumuk deleted the nonblocking-io-error branch December 16, 2025 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants