Skip to content

Conversation

@pradeepbn
Copy link
Contributor

@pradeepbn pradeepbn commented Oct 29, 2021

Descriptions of the changes in this PR:

Motivation

After the readcache has been shutdown, all the buffers will be released in memory. In the meantime, if read and write ops are performed on the read cache, then it results in the segmentation fault on invalid memory. So, this PR will make the readops and writeops as no-ops.

Changes

  • Added a shutdown flag in SingleDirectoryDbLedgerStorage.java. This flag is set when close() is called.
  • If the shutdown flag is set return null for getEntry and -1 for addEntry.
  • Unit tests to check this.

@Override
public void close() {
cacheSegments.forEach(ByteBuf::release);
isStorageShutdown = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need the write lock else a read can still sneak through.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

lock.readLock().lock();
if (isStorageShutdown) {
LOG.error("Read cache has shutdown, not allowing put cache operation");
throw new IOException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a message here so we know from logs what the cause was.

Copy link
Contributor

@mauricebarnum mauricebarnum Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just fail operations silently when shutting down? if higher level can't handle a no-op put and a "not found" get, we have other problems.

looking at SingleDirectoryDbLedgerStorage.getEntry tossing an exception from the read cache seems to be an odd way to address shutting down.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually Netty and its channels is already shutdown by the time ledger storage is shutdown, so no responses can be sent. So making these silent no-ops makes sense. It also makes more sense to move this check up to SingleDirectoryDbLedgerStorage so that it covers all sub-component (caches, entry log, indexes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Changed the logic to reflect this comment.


if (isStorageShutdown) {
LOG.error("Read cache has shutdown, not allowing put cache operation");
throw new IOException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


if (isStorageShutdown) {
LOG.error("Read cache has shutdown, not allowing read cache operation");
throw new IOException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Test
public void testReadsOpAfterShutdown() throws Exception {
SingleDirectoryDbLedgerStorage singleDirStorage = ((DbLedgerStorage) storage).getLedgerStorageList().get(0);
singleDirStorage.shutdown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the teardown also calls shutdown which causes the teardown to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdown should be idempotent if called twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eolivelli Handled the idempotency case in the latest fix. However, shutdown() on the storage itself must be made idempotent. I will create a separate issue to handle that situation.


private ByteBufAllocator allocator;
private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
private Boolean isStorageShutdown;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"boolean" and not "Boolean"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add "volatile"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Test
public void testReadsOpAfterShutdown() throws Exception {
SingleDirectoryDbLedgerStorage singleDirStorage = ((DbLedgerStorage) storage).getLedgerStorageList().get(0);
singleDirStorage.shutdown();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdown should be idempotent if called twice

SingleDirectoryDbLedgerStorage singleDirStorage = ((DbLedgerStorage) storage).getLedgerStorageList().get(0);
singleDirStorage.shutdown();
try {
ByteBuf res = singleDirStorage.getEntry(4, 3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add "fail()" here, otherwise the test will pass in any case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled it with @Test(expected = IOException.class)

try {
ByteBuf res = singleDirStorage.getEntry(4, 3);
} catch (IOException e) {
// This will pass the test since it is expected to throw IOException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an assertion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with the latest fix

@pkumar-singh
Copy link
Member

A very general comment. I think here, there is no need to use volatile as well as lock. In fact since access to isStorageShutdown is always under a lock. You might as well drop volatile keyword. Just a thought. Else looks good to me.

@pradeepbn pradeepbn requested a review from eolivelli November 1, 2021 21:58
@pradeepbn pradeepbn force-pushed the DisableReadOpsDuringBookieShutdown branch from efbd294 to 945e97b Compare November 3, 2021 01:04
implementation project(':stream:common')
implementation depLibs.grpc
implementation depLibs.slf4j
implementation depLibs.log4jSlf4jImpl
Copy link
Member

@pkumar-singh pkumar-singh Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider change this to testImplementation, If they are only referenced in tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to separate PR #2877

implementation depLibs.nettyTransportNativeEpoll
implementation depLibs.zookeeper
implementation depLibs.log4j12api
implementation depLibs.log4jCore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradeepbn Kindly consider changing it to testImplementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to separate PR #2877

implementation depLibs.zookeeper
implementation depLibs.log4j12api
implementation depLibs.log4jCore
implementation depLibs.log4jSlf4jImpl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to separate PR #2877

@pradeepbn pradeepbn force-pushed the DisableReadOpsDuringBookieShutdown branch from 25cc514 to cade490 Compare November 4, 2021 23:32
@pradeepbn pradeepbn force-pushed the DisableReadOpsDuringBookieShutdown branch from cade490 to 493a316 Compare November 5, 2021 15:12
@Vanlightly
Copy link
Contributor

@pradeepbn I am not so keen on this lock being in the hot path for every read. We started at the ReadCache because we saw that specific problem in prod, then moved up but I think what we have seen here is that the shutdown sequence itself needs some love. For example, this particular problem can probably be solved in the BookieServer class by ensuring that the request processor it shutdown first, then ledger storage.

I think we should close this PR and take a deeper look at the shutdown sequence. There is a related issue #2879 that also is related to shutdown.

@pradeepbn
Copy link
Contributor Author

This PR will be correctly addressed in #2888

@pradeepbn pradeepbn closed this Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants