-
Notifications
You must be signed in to change notification settings - Fork 963
Readops and writeops for the read cache are disabled after the readcache has been shutdown #2868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readops and writeops for the read cache are disabled after the readcache has been shutdown #2868
Conversation
| @Override | ||
| public void close() { | ||
| cacheSegments.forEach(ByteBuf::release); | ||
| isStorageShutdown = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need the write lock else a read can still sneak through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| lock.readLock().lock(); | ||
| if (isStorageShutdown) { | ||
| LOG.error("Read cache has shutdown, not allowing put cache operation"); | ||
| throw new IOException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need a message here so we know from logs what the cause was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just fail operations silently when shutting down? if higher level can't handle a no-op put and a "not found" get, we have other problems.
looking at SingleDirectoryDbLedgerStorage.getEntry tossing an exception from the read cache seems to be an odd way to address shutting down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually Netty and its channels is already shutdown by the time ledger storage is shutdown, so no responses can be sent. So making these silent no-ops makes sense. It also makes more sense to move this check up to SingleDirectoryDbLedgerStorage so that it covers all sub-component (caches, entry log, indexes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Changed the logic to reflect this comment.
|
|
||
| if (isStorageShutdown) { | ||
| LOG.error("Read cache has shutdown, not allowing put cache operation"); | ||
| throw new IOException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exception message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| if (isStorageShutdown) { | ||
| LOG.error("Read cache has shutdown, not allowing read cache operation"); | ||
| throw new IOException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exception message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| @Test | ||
| public void testReadsOpAfterShutdown() throws Exception { | ||
| SingleDirectoryDbLedgerStorage singleDirStorage = ((DbLedgerStorage) storage).getLedgerStorageList().get(0); | ||
| singleDirStorage.shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the teardown also calls shutdown which causes the teardown to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shutdown should be idempotent if called twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eolivelli Handled the idempotency case in the latest fix. However, shutdown() on the storage itself must be made idempotent. I will create a separate issue to handle that situation.
|
|
||
| private ByteBufAllocator allocator; | ||
| private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock(); | ||
| private Boolean isStorageShutdown; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"boolean" and not "Boolean"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add "volatile"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| @Test | ||
| public void testReadsOpAfterShutdown() throws Exception { | ||
| SingleDirectoryDbLedgerStorage singleDirStorage = ((DbLedgerStorage) storage).getLedgerStorageList().get(0); | ||
| singleDirStorage.shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shutdown should be idempotent if called twice
| SingleDirectoryDbLedgerStorage singleDirStorage = ((DbLedgerStorage) storage).getLedgerStorageList().get(0); | ||
| singleDirStorage.shutdown(); | ||
| try { | ||
| ByteBuf res = singleDirStorage.getEntry(4, 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add "fail()" here, otherwise the test will pass in any case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled it with @Test(expected = IOException.class)
| try { | ||
| ByteBuf res = singleDirStorage.getEntry(4, 3); | ||
| } catch (IOException e) { | ||
| // This will pass the test since it is expected to throw IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add an assertion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with the latest fix
|
A very general comment. I think here, there is no need to use volatile as well as lock. In fact since access to isStorageShutdown is always under a lock. You might as well drop volatile keyword. Just a thought. Else looks good to me. |
efbd294 to
945e97b
Compare
| implementation project(':stream:common') | ||
| implementation depLibs.grpc | ||
| implementation depLibs.slf4j | ||
| implementation depLibs.log4jSlf4jImpl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider change this to testImplementation, If they are only referenced in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved to separate PR #2877
bookkeeper-benchmark/build.gradle
Outdated
| implementation depLibs.nettyTransportNativeEpoll | ||
| implementation depLibs.zookeeper | ||
| implementation depLibs.log4j12api | ||
| implementation depLibs.log4jCore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pradeepbn Kindly consider changing it to testImplementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved to separate PR #2877
bookkeeper-benchmark/build.gradle
Outdated
| implementation depLibs.zookeeper | ||
| implementation depLibs.log4j12api | ||
| implementation depLibs.log4jCore | ||
| implementation depLibs.log4jSlf4jImpl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved to separate PR #2877
25cc514 to
cade490
Compare
…che has been shutdown
cade490 to
493a316
Compare
|
@pradeepbn I am not so keen on this lock being in the hot path for every read. We started at the ReadCache because we saw that specific problem in prod, then moved up but I think what we have seen here is that the shutdown sequence itself needs some love. For example, this particular problem can probably be solved in the BookieServer class by ensuring that the request processor it shutdown first, then ledger storage. I think we should close this PR and take a deeper look at the shutdown sequence. There is a related issue #2879 that also is related to shutdown. |
|
This PR will be correctly addressed in #2888 |
Descriptions of the changes in this PR:
Motivation
After the readcache has been shutdown, all the buffers will be released in memory. In the meantime, if read and write ops are performed on the read cache, then it results in the segmentation fault on invalid memory. So, this PR will make the readops and writeops as no-ops.
Changes