Skip to content

Conversation

@zymap
Copy link
Member

@zymap zymap commented Aug 11, 2025

Motivation

We met the GarbageCollectionThread was stopped by some runtime error, but we didn't catch it then, causing the GC to stop.
Such as:
#3901
#4544

In our case, the GC stopped because of the OutOfDirectMemoryException then the process stopped and the files can not be deleted. But we didn't see any error logs. This PR enhance the log info when an unhandled error happens.
We already have the PR fixed that.

And another fix in this PR is to change the Exception to the Throwable in the getEntryLogMetadata.

Here is the error stack:

    io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 213909504 byte(s) of direct memory (used: 645922847, max: 858783744)
	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:880)
	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:809)
	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718)
	at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:707)
	at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:224)
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:142)
	at io.netty.buffer.PoolArena.reallocate(PoolArena.java:317)
	at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:123)
	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305)
	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:104)
	at org.apache.bookkeeper.bookie.DefaultEntryLogger.extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109)
	at org.apache.bookkeeper.bookie.DefaultEntryLogger.getEntryLogMetadata(DefaultEntryLogger.java:1060)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:678)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:365)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.lambda$triggerGC$4(GarbageCollectorThread.java:268)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Unknown Source)

You can see it get much more memory used here extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109). The reason is that the header has the wrong data of the header, which should already be fixed by #4607. Then it reading with a wrong map size which could take a lot of memory.

@zymap zymap self-assigned this Aug 11, 2025
@zymap zymap requested a review from hangc0276 August 11, 2025 10:01
@hangc0276 hangc0276 added this to the 4.18.0 milestone Aug 12, 2025
@zymap zymap merged commit e80d031 into apache:master Aug 12, 2025
40 of 42 checks passed
zymap added a commit that referenced this pull request Aug 12, 2025
### Motivation

We met the GarbageCollectionThread was stopped by some runtime error, but we didn't catch it then, causing the GC to stop.
Such as:
#3901
#4544

In our case, the GC stopped because of the OutOfDirectMemoryException then the process stopped and the files can not be deleted. But we didn't see any error logs. This PR enhance the log info when an unhandled error happens.
We already have the [PR](#4544) fixed that.

And another fix in this PR is to change the Exception to the Throwable in the getEntryLogMetadata.

Here is the error stack:

```
    io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 213909504 byte(s) of direct memory (used: 645922847, max: 858783744)
	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:880)
	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:809)
	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718)
	at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:707)
	at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:224)
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:142)
	at io.netty.buffer.PoolArena.reallocate(PoolArena.java:317)
	at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:123)
	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305)
	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:104)
	at org.apache.bookkeeper.bookie.DefaultEntryLogger.extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109)
	at org.apache.bookkeeper.bookie.DefaultEntryLogger.getEntryLogMetadata(DefaultEntryLogger.java:1060)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:678)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:365)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.lambda$triggerGC$4(GarbageCollectorThread.java:268)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Unknown Source)
```

You can see it get much more memory used here extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109). The reason is that the header has the wrong data of the header, which should already be fixed by #4607. Then it reading with a wrong map size which could take a lot of memory.

(cherry picked from commit e80d031)
zymap added a commit that referenced this pull request Aug 12, 2025
### Motivation

We met the GarbageCollectionThread was stopped by some runtime error, but we didn't catch it then, causing the GC to stop.
Such as:
#3901
#4544

In our case, the GC stopped because of the OutOfDirectMemoryException then the process stopped and the files can not be deleted. But we didn't see any error logs. This PR enhance the log info when an unhandled error happens.
We already have the [PR](#4544) fixed that.

And another fix in this PR is to change the Exception to the Throwable in the getEntryLogMetadata.

Here is the error stack:

```
    io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 213909504 byte(s) of direct memory (used: 645922847, max: 858783744)
	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:880)
	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:809)
	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718)
	at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:707)
	at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:224)
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:142)
	at io.netty.buffer.PoolArena.reallocate(PoolArena.java:317)
	at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:123)
	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305)
	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:104)
	at org.apache.bookkeeper.bookie.DefaultEntryLogger.extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109)
	at org.apache.bookkeeper.bookie.DefaultEntryLogger.getEntryLogMetadata(DefaultEntryLogger.java:1060)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:678)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:365)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.lambda$triggerGC$4(GarbageCollectorThread.java:268)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Unknown Source)
```

You can see it get much more memory used here extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109). The reason is that the header has the wrong data of the header, which should already be fixed by #4607. Then it reading with a wrong map size which could take a lot of memory.

(cherry picked from commit e80d031)
priyanshu-ctds pushed a commit to datastax/bookkeeper that referenced this pull request Aug 20, 2025
### Motivation

We met the GarbageCollectionThread was stopped by some runtime error, but we didn't catch it then, causing the GC to stop.
Such as:
apache#3901
apache#4544

In our case, the GC stopped because of the OutOfDirectMemoryException then the process stopped and the files can not be deleted. But we didn't see any error logs. This PR enhance the log info when an unhandled error happens.
We already have the [PR](apache#4544) fixed that.

And another fix in this PR is to change the Exception to the Throwable in the getEntryLogMetadata.

Here is the error stack:

```
    io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 213909504 byte(s) of direct memory (used: 645922847, max: 858783744)
	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:880)
	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:809)
	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718)
	at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:707)
	at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:224)
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:142)
	at io.netty.buffer.PoolArena.reallocate(PoolArena.java:317)
	at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:123)
	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305)
	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:104)
	at org.apache.bookkeeper.bookie.DefaultEntryLogger.extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109)
	at org.apache.bookkeeper.bookie.DefaultEntryLogger.getEntryLogMetadata(DefaultEntryLogger.java:1060)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:678)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:365)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.lambda$triggerGC$4(GarbageCollectorThread.java:268)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Unknown Source)
```

You can see it get much more memory used here extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109). The reason is that the header has the wrong data of the header, which should already be fixed by apache#4607. Then it reading with a wrong map size which could take a lot of memory.

(cherry picked from commit e80d031)
(cherry picked from commit 55308e9)
srinath-ctds pushed a commit to datastax/bookkeeper that referenced this pull request Aug 21, 2025
### Motivation

We met the GarbageCollectionThread was stopped by some runtime error, but we didn't catch it then, causing the GC to stop.
Such as:
apache#3901
apache#4544

In our case, the GC stopped because of the OutOfDirectMemoryException then the process stopped and the files can not be deleted. But we didn't see any error logs. This PR enhance the log info when an unhandled error happens.
We already have the [PR](apache#4544) fixed that.

And another fix in this PR is to change the Exception to the Throwable in the getEntryLogMetadata.

Here is the error stack:

```
    io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 213909504 byte(s) of direct memory (used: 645922847, max: 858783744)
	at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:880)
	at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:809)
	at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718)
	at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:707)
	at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:224)
	at io.netty.buffer.PoolArena.allocate(PoolArena.java:142)
	at io.netty.buffer.PoolArena.reallocate(PoolArena.java:317)
	at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:123)
	at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305)
	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103)
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:104)
	at org.apache.bookkeeper.bookie.DefaultEntryLogger.extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109)
	at org.apache.bookkeeper.bookie.DefaultEntryLogger.getEntryLogMetadata(DefaultEntryLogger.java:1060)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:678)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:365)
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.lambda$triggerGC$4(GarbageCollectorThread.java:268)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Unknown Source)
```

You can see it get much more memory used here extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109). The reason is that the header has the wrong data of the header, which should already be fixed by apache#4607. Then it reading with a wrong map size which could take a lot of memory.

(cherry picked from commit e80d031)
(cherry picked from commit 55308e9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants