Skip to content

Conversation

@shibd
Copy link
Owner

@shibd shibd commented Nov 8, 2022

Master Issue: #

Motivation

Modifications

Documentation

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@shibd shibd force-pushed the enhance_producer_acc branch from 6e6f243 to 3e94653 Compare November 8, 2022 15:24
@shibd shibd closed this Nov 9, 2022
shibd pushed a commit that referenced this pull request Nov 29, 2022
…#121)

### Motivation

When I ran the tests of Python wrapper in my local env, I observed a
segmentation fault. See the key stacktrace:

```
#3  0x00007ffff6d742c5 in std::unique_lock<std::mutex>::lock() () from /usr/local/lib/python3.8/dist-packages/_pulsar.cpython-38-x86_64-linux-gnu.so
#4  0x00007ffff6d72523 in std::unique_lock<std::mutex>::unique_lock(std::mutex&) ()
   from /usr/local/lib/python3.8/dist-packages/_pulsar.cpython-38-x86_64-linux-gnu.so
#5  0x00007ffff67de193 in pulsar::ClientImpl::newRequestId (this=0x0) at /home/xyz/github.com/apache/pulsar-client-cpp/lib/ClientImpl.cc:644
#6  0x00007ffff685d2c2 in pulsar::ConsumerImpl::~ConsumerImpl (this=0x7fff9800f9e0, __in_chrg=<optimized out>)
    at /home/xyz/github.com/apache/pulsar-client-cpp/lib/ConsumerImpl.cc:116
```

In the destructor of `ConsumerImpl`, `client->newRequestId` might be
called. However, `client` might be a null pointer because it's returned
by `std::weak_ptr::lock()`.

### Modifications

Add null check to avoid the segfault.
shibd pushed a commit that referenced this pull request Jan 13, 2023
Fixes apache#167

### Motivation

Here are some debugging info when the segfault happened in
`testCloseClient`. The outputs have been trimmed to make them clear.

An example crash at `async_write`:

```
#12 0x00007ffff7496dad in basic_stream_socket<...>::boost::asio::async_write /usr/include/boost/asio/impl/write.hpp:512
#13 0x00007ffff748e003 in ClientConnection::asyncWrite lib/ClientConnection.h:245
#14 0x00007ffff746e0b6 in ClientConnection::handleHandshake (this=0x555555e689d0) lib/ClientConnection.cc:502
```

Another example crash at `async_receive`:

```
#6  0x00007ffff7497247 in basic_stream_socket<...>::async_receive /usr/include/boost/asio/basic_stream_socket.hpp:677
#7  0x00007ffff748e647 in ClientConnection::asyncReceive lib/ClientConnection.h:258
#8  0x00007ffff746fa5d in ClientConnection::readNextCommand lib/ClientConnection.cc:606
```

The frame where it crashed:

```
245       if (descriptor_data->shutdown_)
(gdb) p descriptor_data
$2 = (boost::asio::detail::epoll_reactor::per_descriptor_data &) @0x555555e4a780: 0x0
```

We can see the socket descriptor is `nullptr`. The root cause is when
`async_receive` or `async_write` is called, the `io_service` object
might be closed. This case happened when `createProducerAsync` is
called, the actual producer creation continues in another thread, while
the `client.close()` happens in the current thread.

### Modifications

Check if the `ClientConnection` is closed before `async_receive` or
`async_write`. To avoid the use of lock, changing the `state_` field to
atomic.

### Verifications

```bash
./tests/pulsar-tests --gtest_filter='ClientTest.testCloseClient' --gtest_repeat=20
```

It never crashed after applying this patch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants