Skip to content

Conversation

@massakam
Copy link
Contributor

@massakam massakam commented Oct 4, 2019

Master Issue: #5234

Motivation

The other day, I fixed a memory leak caused by not being executed the destructor of C++ producer in #5246. However, when running a producer application written in Go in an environment with the modified C++ client library installed, the program occasionally crashes due to a "bad_weak_ptr" error.

2019/10/01 16:34:30.210 c_client.go:68: [info] INFO  | ProducerImpl:481 | [persistent://massakam/global/test/t1, dc1-904-1012912] Closing producer for topic persistent://massakam/global/test/t1
2019/10/01 16:34:30.211 c_client.go:68: [info] INFO  | ProducerImpl:463 | Producer - [persistent://massakam/global/test/t1, dc1-904-1012912] , [batchMessageContainer = { BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://massakam/global/test/t1] [producerName_ = dc1-904-1012912] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 1] [averageBatchSize = 1]}]
2019/10/01 16:34:30.211 c_client.go:68: [info] INFO  | BatchMessageContainer:171 | [numberOfBatchesSent = 1] [averageBatchSize = 1]
terminate called after throwing an instance of 'std::bad_weak_ptr'
  what():  2019/10/01 16:34:30.211 c_client.go:68: [info] INFO  | ProducerImpl:463 | Producer - [persistent://massakam/global/test/t1, dc1-904-1012911] , [batchMessageContainer = { BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://massakam/global/test/t1] [producerName_ = dc1-904-1012911] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 1] [averageBatchSize = 1]}]
bad_weak_ptr
2019/10/01 16:34:30.211 c_client.go:68: [info] INFO  | BatchMessageContainer:171 | [numberOfBatchesSent = 1] [averageBatchSize = 1]
2019/10/01 16:34:30.211 c_client.go:68: [info] INFO  | ProducerImpl:463 | Producer - [persistent://massakam/global/test/t1, dc1-904-1012910] , [batchMessageContainer = { BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://massakam/global/test/t1] [producerName_ = dc1-904-1012910] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 1] [averageBatchSize = 1]}]
SIGABRT: abort
PC=0x7fc78d39d2c7 m=0 sigcode=18446744073709551610

As a result of the investigation, I found that the destructor was called before the process of closing ProducerImpl was completed, and the object was destroyed.

Modifications

To keep the ProducerImpl object alive, get its own shared pointer at the beginning of ProducerImpl::closeAsync(). And the pointer must be passed to ProducerImpl::handleClose(). Otherwise, the object will be destroyed before handleClose() is called.

So far, this issue has not been reproduced in ConsumerImpl, but I fixed it in the same way as ProducerImpl.

@massakam massakam added type/bug The PR fixed a bug or issue reported a bug component/c++ labels Oct 4, 2019
@massakam massakam added this to the 2.4.2 milestone Oct 4, 2019
@massakam massakam self-assigned this Oct 4, 2019
@massakam massakam force-pushed the fix-bad-weak-ptr-error branch from 207a4a3 to 475a0ad Compare October 4, 2019 07:22
@massakam
Copy link
Contributor Author

massakam commented Oct 4, 2019

rerun cpp tests

@aahmed-se aahmed-se requested a review from merlimat October 4, 2019 19:46
@sijie sijie merged commit dbd48ab into apache:master Oct 10, 2019
@massakam massakam deleted the fix-bad-weak-ptr-error branch October 11, 2019 02:43
wolfstudy pushed a commit that referenced this pull request Nov 20, 2019
Master Issue: #5234

### Motivation

The other day, I fixed a memory leak caused by not being executed the destructor of C++ producer in #5246. However, when running a producer application written in Go in an environment with the modified C++ client library installed, the program occasionally crashes due to a "bad_weak_ptr" error.

```
2019/10/01 16:34:30.210 c_client.go:68: [info] INFO  | ProducerImpl:481 | [persistent://massakam/global/test/t1, dc1-904-1012912] Closing producer for topic persistent://massakam/global/test/t1
2019/10/01 16:34:30.211 c_client.go:68: [info] INFO  | ProducerImpl:463 | Producer - [persistent://massakam/global/test/t1, dc1-904-1012912] , [batchMessageContainer = { BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://massakam/global/test/t1] [producerName_ = dc1-904-1012912] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 1] [averageBatchSize = 1]}]
2019/10/01 16:34:30.211 c_client.go:68: [info] INFO  | BatchMessageContainer:171 | [numberOfBatchesSent = 1] [averageBatchSize = 1]
terminate called after throwing an instance of 'std::bad_weak_ptr'
  what():  2019/10/01 16:34:30.211 c_client.go:68: [info] INFO  | ProducerImpl:463 | Producer - [persistent://massakam/global/test/t1, dc1-904-1012911] , [batchMessageContainer = { BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://massakam/global/test/t1] [producerName_ = dc1-904-1012911] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 1] [averageBatchSize = 1]}]
bad_weak_ptr
2019/10/01 16:34:30.211 c_client.go:68: [info] INFO  | BatchMessageContainer:171 | [numberOfBatchesSent = 1] [averageBatchSize = 1]
2019/10/01 16:34:30.211 c_client.go:68: [info] INFO  | ProducerImpl:463 | Producer - [persistent://massakam/global/test/t1, dc1-904-1012910] , [batchMessageContainer = { BatchContainer [size = 0] [batchSizeInBytes_ = 0] [maxAllowedMessageBatchSizeInBytes_ = 131072] [maxAllowedNumMessagesInBatch_ = 1000] [topicName = persistent://massakam/global/test/t1] [producerName_ = dc1-904-1012910] [batchSizeInBytes_ = 0] [numberOfBatchesSent = 1] [averageBatchSize = 1]}]
SIGABRT: abort
PC=0x7fc78d39d2c7 m=0 sigcode=18446744073709551610
```

As a result of the investigation, I found that the destructor was called before the process of closing `ProducerImpl` was completed, and the object was destroyed.

### Modifications

To keep the `ProducerImpl` object alive, get its own shared pointer at the beginning of `ProducerImpl::closeAsync()`. And the pointer must be passed to `ProducerImpl::handleClose()`. Otherwise, the object will be destroyed before `handleClose()` is called.

So far, this issue has not been reproduced in `ConsumerImpl`, but I fixed it in the same way as `ProducerImpl`.

(cherry picked from commit dbd48ab)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants