-
Notifications
You must be signed in to change notification settings - Fork 963
[fix] Fix ByteBuf release/retain in PerChannelBookClient #4289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| final Channel c = channel; | ||
| if (c == null) { | ||
| // usually checked in writeAndFlush, but we have extra check | ||
| // because we need to release toSend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update also the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Please make the title more specific. for example "Fix ByteBuf release/retain in PerChannelBookClient" etc. |
| // because we need to release toSend. | ||
| errorOut(completionKey); | ||
| ReferenceCountUtil.release(toSend); | ||
| ReferenceCountUtil.release(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also result in the correct behavior when useV2WireProtocol == false?
It doesn't look consistent how the V3 protocol is currently handled since there's no retained duplicate and the backing Netty ByteBuf isn't released.
In the case of V3 protocol, the body might consist of com.google.protobuf.NioByteString instances that wrap the Nio ByteBuffer instances backed by Netty ByteBuf instances which also control the lifecycle of the Nio ByteBuffer instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the concern about the NioBuffer lifecycle was raised by @sijie already in the original PR review: #791 (comment) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added release at these locations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also result in the correct behavior when useV2WireProtocol == false?
No memory leak when useV2WireProtocol is false, but the error io.netty.util.IllegalReferenceCountException: refCnt: 0, increment: 1 described in the Motivation will also occur.
horizonzy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. LGTM
Done |
| } catch (Throwable e) { | ||
| LOG.warn("Operation {} failed", StringUtils.requestToString(request), e); | ||
| errorOut(key); | ||
| if (!calledWrite) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little weird. I think this catch wants to catch the channel is null for some reason. That means we may forget to release if the channel is null at line 1218. As I understand, if the request goes into the writeAndFlush, it should be handled well by Netty. So all the exceptions we get here, we can release the request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved
hangc0276
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
| if (channel == null) { | ||
| LOG.warn("Operation {} failed: channel == null", StringUtils.requestToString(request)); | ||
| errorOut(key); | ||
| ReferenceCountUtil.release(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the wrong thing to do generally in this method. I'll share a new PR where I explain an alternative approach that handles both the V3 protocol and the UnsafeByteOperations issue and the issue that was found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To trace the context easier, I move the second comment here.
I think it's the wrong thing to do generally in this method. I'll share a new PR where I explain an alternative approach that handles both the V3 protocol and the UnsafeByteOperations issue and the issue that was found.
I don't think that the current changes in this PR are ok. I created #4293 to show what I think that should be done.
I added a section named Explanation in the Motivation, it explains that the current change is correct. Thanks for mentioning me and great suggestions ❤️
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the current changes in this PR are ok. I created #4293 to show what I think that should be done.
Thanks @poorbarcode . Let's merge this PR first. I'll then rebase #4293 to fix the invalid handling of the wrapped nio ByteBuffer lifecycle. That's a remaining problem in this area. It seems that it's not only a problem with V3 protocol. |
* [fix] ByteBuf release/retain incorrect * improve the code comment * fix other cases * modify the code comment * improve the code * improve the test * add description
* [fix] ByteBuf release/retain incorrect * improve the code comment * fix other cases * modify the code comment * improve the code * improve the test * add description
…4293) ### Motivation This addresses the remaining gaps of #4289 in handling ByteBuf retain/release. This PR will also address the concern about NioBuffer lifecycle brought up in the review of the original PR review: #791 (comment) . This PR fixes several problems: * ByteString buffer lifecycle in client, follows ByteBufList lifecycle * ByteBufList lifecycle, moved to write promise * Calling of write promises in AuthHandler which buffers messages while authentication is in progress. It was ignoring the promises. ### Changes - add 2 callback parameters to writeAndFlush: cleanupActionFailedBeforeWrite and cleanupActionAfterWrite - use these callback actions for proper cleanup - extract a utility class ByteStringUtil for wrapping ByteBufList or ByteBuf as concatenated zero copy ByteString - properly handle releasing of ByteBufList in the write promise - properly handle calling promises that are buffered while authentication is in progress
…4293) ### Motivation This addresses the remaining gaps of #4289 in handling ByteBuf retain/release. This PR will also address the concern about NioBuffer lifecycle brought up in the review of the original PR review: #791 (comment) . This PR fixes several problems: * ByteString buffer lifecycle in client, follows ByteBufList lifecycle * ByteBufList lifecycle, moved to write promise * Calling of write promises in AuthHandler which buffers messages while authentication is in progress. It was ignoring the promises. ### Changes - add 2 callback parameters to writeAndFlush: cleanupActionFailedBeforeWrite and cleanupActionAfterWrite - use these callback actions for proper cleanup - extract a utility class ByteStringUtil for wrapping ByteBufList or ByteBuf as concatenated zero copy ByteString - properly handle releasing of ByteBufList in the write promise - properly handle calling promises that are buffered while authentication is in progress (cherry picked from commit 0ef2f99)
…pache#4293) This addresses the remaining gaps of apache#4289 in handling ByteBuf retain/release. This PR will also address the concern about NioBuffer lifecycle brought up in the review of the original PR review: apache#791 (comment) . This PR fixes several problems: * ByteString buffer lifecycle in client, follows ByteBufList lifecycle * ByteBufList lifecycle, moved to write promise * Calling of write promises in AuthHandler which buffers messages while authentication is in progress. It was ignoring the promises. - add 2 callback parameters to writeAndFlush: cleanupActionFailedBeforeWrite and cleanupActionAfterWrite - use these callback actions for proper cleanup - extract a utility class ByteStringUtil for wrapping ByteBufList or ByteBuf as concatenated zero copy ByteString - properly handle releasing of ByteBufList in the write promise - properly handle calling promises that are buffered while authentication is in progress
…pache#4293) ### Motivation This addresses the remaining gaps of apache#4289 in handling ByteBuf retain/release. This PR will also address the concern about NioBuffer lifecycle brought up in the review of the original PR review: apache#791 (comment) . This PR fixes several problems: * ByteString buffer lifecycle in client, follows ByteBufList lifecycle * ByteBufList lifecycle, moved to write promise * Calling of write promises in AuthHandler which buffers messages while authentication is in progress. It was ignoring the promises. ### Changes - add 2 callback parameters to writeAndFlush: cleanupActionFailedBeforeWrite and cleanupActionAfterWrite - use these callback actions for proper cleanup - extract a utility class ByteStringUtil for wrapping ByteBufList or ByteBuf as concatenated zero copy ByteString - properly handle releasing of ByteBufList in the write promise - properly handle calling promises that are buffered while authentication is in progress (cherry picked from commit 0ef2f99) # Conflicts: # bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
…4293) (#4396) ### Motivation This addresses the remaining gaps of #4289 in handling ByteBuf retain/release. This PR will also address the concern about NioBuffer lifecycle brought up in the review of the original PR review: #791 (comment) . This PR fixes several problems: * ByteString buffer lifecycle in client, follows ByteBufList lifecycle * ByteBufList lifecycle, moved to write promise * Calling of write promises in AuthHandler which buffers messages while authentication is in progress. It was ignoring the promises. ### Changes - add 2 callback parameters to writeAndFlush: cleanupActionFailedBeforeWrite and cleanupActionAfterWrite - use these callback actions for proper cleanup - extract a utility class ByteStringUtil for wrapping ByteBufList or ByteBuf as concatenated zero copy ByteString - properly handle releasing of ByteBufList in the write promise - properly handle calling promises that are buffered while authentication is in progress (cherry picked from commit 0ef2f99) # Conflicts: # bookkeeper-server/src/main/java/org/apache/bookkeeper/proto/PerChannelBookieClient.java
* [fix] ByteBuf release/retain incorrect * improve the code comment * fix other cases * modify the code comment * improve the code * improve the test * add description
…pache#4293) ### Motivation This addresses the remaining gaps of apache#4289 in handling ByteBuf retain/release. This PR will also address the concern about NioBuffer lifecycle brought up in the review of the original PR review: apache#791 (comment) . This PR fixes several problems: * ByteString buffer lifecycle in client, follows ByteBufList lifecycle * ByteBufList lifecycle, moved to write promise * Calling of write promises in AuthHandler which buffers messages while authentication is in progress. It was ignoring the promises. ### Changes - add 2 callback parameters to writeAndFlush: cleanupActionFailedBeforeWrite and cleanupActionAfterWrite - use these callback actions for proper cleanup - extract a utility class ByteStringUtil for wrapping ByteBufList or ByteBuf as concatenated zero copy ByteString - properly handle releasing of ByteBufList in the write promise - properly handle calling promises that are buffered while authentication is in progress
Thanks
This root cause was found by @codelipenghui and I just tried to reproduce and write this PR.
Motivation
PerChannelBookieClient.addEntryduplicates the binary data (@param toSend) and sends the binary data new to the IO out. Manually release the binary data new if can not be sent out due to the channel switching. See the explanation below.Issue: it released the original data (
@param toSend), it is wrong. Leading a memory leak and the bellow error logExplanation
Background: stacks of
add entryBookieClientImpl.addEntry.a. Retain the
ByteBufbefore getPerChannelBookieClient. We call thisByteBufastoSendin the following sections.toSend.recCnfis2now.Get PerChannelBookieClientChannelReadyForAddEntryCallback.operationCompletea.
PerChannelBookieClient.addEntrya-1. Build a new ByteBuf for request command. We call this
ByteBufnew asrequestin the following sections.a-2.
channle.writeAndFlush(request)or release the ByteBuf whenchannelis switching.b. Release the
ByteBufsince it has been retained atstep 1.toSend.recCnfshould be1now.(Highlight) the difference of
V2andV3is only in the step "3-a-1. Build a new ByteBuf for request command"1. The Step
3-a-1:V2 & Small payload3-a-1: Build a new ByteBuf for request commandrequest = toSend.retainedDuplicate().3-a-2: Release the ByteBuftoSend.release()Explanation: regarding this case,
requestandtoSendshare the samerefCnf, it is fine to release a random one.2. The Step
3-a-1:V2 & Large payload3-a-1: Build a new ByteBuf for request commandrequest = ByteBufList.clone(toSend).3-a-2: Release the ByteBuftoSend.release()Explanation: regarding this case,
requestandtoSendare using differentrefCnf. Since the request can not be send out due to thechannelswitching, BK client should releaserequest. It would cause arefCnfincorrect error if release an incorrect one. Highlight: since the internal ByteBufs ofrequestandtoSendare the same, it will not leading a memory leak even if it releases an incorrect one.3. The Step
3-a-1:V33-a-1: Build a new ByteBuf for request commandrequest: build a absolutely one new ByteBuf.3-a-2: Release the ByteBuftoSend.release()Explanation: regarding this case,
requestandtoSendare using differentrefCnf. Since the request can not be send out due to thechannelswitching, BK client should releaserequest. It would cause anrefCnfincorrect error if releasing a incorrect one. Highlight: since the exactly ByteBuf ofrequestandtoSendare different, it will leading a memory leak if it releases an incorrect one.Changes
Correct the code