Skip to content

Conversation

@szetszwo
Copy link
Contributor

@szetszwo szetszwo commented Jun 7, 2023

@szetszwo
Copy link
Contributor Author

szetszwo commented Jun 7, 2023

@guohao-rosicky , could you test if this could fix this memory leak and also the cleanup problem?

@guohao-rosicky
Copy link
Contributor

@guohao-rosicky , could you test if this could fix this memory leak and also the cleanup problem?

sure.

@guohao-rosicky
Copy link
Contributor

guohao-rosicky commented Jun 8, 2023

hi, @szetszwo

By looking at the LEAK log, I'm not sure if the leak was caused by calling ByteBuf::retain here and not calling ByteBuf::release twice.

see:
https://github.com/apache/ratis/blob/master/ratis-netty/src/main/java/org/apache/ratis/netty/NettyDataStreamUtils.java#L139

  static DataStreamRequestByteBuf decodeDataStreamRequestByteBuf(ByteBuf buf) {
    return Optional.ofNullable(decodeDataStreamRequestHeader(buf))
        .map(header -> checkHeader(header, buf))
        .map(header -> new DataStreamRequestByteBuf(header, decodeData(buf, header, ByteBuf::retain)))
        .orElse(null);
  }

leak log:

ERROR org.apache.ratis.thirdparty.io.netty.util.ResourceLeakDetector: LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 

#2:
        org.apache.ratis.netty.NettyDataStreamUtils.decodeData(NettyDataStreamUtils.java:249)
        org.apache.ratis.netty.NettyDataStreamUtils.lambda$decodeDataStreamRequestByteBuf$1(NettyDataStreamUtils.java:140)
        java.util.Optional.map(Optional.java:215)
        org.apache.ratis.netty.NettyDataStreamUtils.decodeDataStreamRequestByteBuf(NettyDataStreamUtils.java:140)
        org.apache.ratis.netty.server.NettyServerStreamRpc$3.decode(NettyServerStreamRpc.java:289)
        org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:510)
        org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:449)
        org.apache.ratis.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:279)
        org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
        org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
        org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
        org.apache.ratis.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
        org.apache.ratis.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
        org.apache.ratis.thirdparty.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
        org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:722)
        org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
        org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
        org.apache.ratis.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
        org.apache.ratis.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)
        org.apache.ratis.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        java.lang.Thread.run(Thread.java:748)

@szetszwo
Copy link
Contributor Author

szetszwo commented Jun 9, 2023

By looking at the LEAK log, ...

Do you mean the new LEAK log after applied this change?

... I'm not sure if the leak was caused by calling ByteBuf::retain here and not calling ByteBuf::release twice.

Calling ByteBuf::release twice seems incorrect. The second call should triggered an exception.

@guohao-rosicky
Copy link
Contributor

hi, @szetszwo. I have committed the code related to stream cleanup on ozone datanode, I have made some changes please help me to review it, thanks.
see:
apache/ozone#4891

@szetszwo
Copy link
Contributor Author

@guohao-rosicky , how is your testing going?

@guohao-rosicky
Copy link
Contributor

@guohao-rosicky , how is your testing going?
hi, @szetszwo .
I feel that this change is valid and can be merged.
When this pr is merged, the I will submit a pr later to cleanup when the server side #channelInactive.

@szetszwo szetszwo requested a review from adoroszlai June 19, 2023 09:38
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @szetszwo for the patch. I'm not familiar with Ratis streaming, but the change itself looks good.

@szetszwo szetszwo merged commit b49a793 into apache:master Jun 19, 2023
@szetszwo
Copy link
Contributor Author

@guohao-rosicky , thanks a lot for testing this!

@adoroszlai , thanks a lot for reviewing this!

@guohao-rosicky
Copy link
Contributor

On top of this pr I added some processing, @szetszwo PTAL, thanks.
see:
#887

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants