Skip to content

Conversation

@jitendra-nalwaya
Copy link

No description provided.

@jhouserizer
Copy link
Member

Hi @jitendra-nalwaya ,

Could you please explain the purpose of using two sentinel files ("cleanShutdown" and "unCleanShutdown")?

What does using two files rather than one provide? why can't the presence of "unCleanShutdown" indicate non-clean, and the absence of it represent clean? or why can't the presence of "cleanShutdown" represent clean, and the absence represent non-clean?

Managing two files is non-atomic, and therefore more prone to leaving one behind when it need not be, or etc.?

There may be an advantage to using two files, and if so, can you please explain?

@jitendra-nalwaya
Copy link
Author

jitendra-nalwaya commented Dec 27, 2022

Hi @jhouserizer,

Used two sentinel files to identify last shutdown status in below scenarios -

  1. When customer upgrade their environment with this enhancement.
  2. Remove directory structure explicitly by customer.
  3. Remove directory structure using destroy/destroyall methods.

On startup,

  • If both files do not exist then we can't identify last shutdown state, therefore we will create 'unCleanShutdown' file.
  • If 'unCleanShutdown' file exist then we will remove directory structure and on shutdown we will delete 'unCleanShutdown' file and create 'cleanShutdown' file.
  • If 'cleanShutdown' file exist then we will delete 'cleanShutdown' file and create 'unCleanShutdown' file.

We can use single sentinel file as well if we write some content in it to identify the last shutdown state.

Could you please elaborate below statement -
Managing two files is non-atomic, and therefore more prone to leaving one behind when it need not be, or etc.?

}
}
} catch (IOException e) {
LOGGER.debug("Either '.cleanShutdown' file is not created or '.unCleanShutdown' is not deleted.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would seem to deserve a little higher than debug logging.

}
}

private void verifyCleanShutdown(final File rootDirectory) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic appears to do more than just verify, it also deletes if the disk is believed to be unclean.

throw new NullPointerException("DefaultPersistenceConfiguration cannot be null");
}
lockFile = new File(rootDirectory, ".lock");
cleanShutdown = new File(rootDirectory, ".cleanShutdown");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand the need for two files here:

↓ unclean.exists() / clean.exists() → true false
true clean (no-op) unclean (delete)
false clean (no-op) clean (no-op)

Since there are only two output states... why not just have a single "clean" marker? Look for it on startup (and then delete it), and then create it on shutdown.

} else if (unCleanShutdown.exists()) {
LOGGER.debug("Probably unclean shutdown was done.");
if (rootDirectory.exists()) {
FileUtils.tryRecursiveDelete(rootDirectory);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this recursive delete going to trash the lock file? This might break the existing protection against parallel use of the same directory by multiple instances? We should probably only do any of this marker file manipulation after we've validated it is safe for us to access (& modify) the directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also thinking that making the delete decision will break any users of the LocalPersistenceService that are fault tolerant? Maybe we need to think of a way of signaling the unclean shutdown to consumers of the service, or move this code to a different layer?

@jitendra-nalwaya
Copy link
Author

@chrisdennis
Creating a single "clean" marker on shutdown and verifying on startup (and then delete it). Please review.

}
}

private void onStartupIfLockFileExists() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method appears to be duplicating a lot of the logic already in the internalStart() method. It seems this could be greatly simplified if you took a more strict "lock first, then check for clean" approach?

} else {
LOGGER.debug("lock file is not exists.");
}
if (tryRecursiveDelete(rootDirectory)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I hinted to this in my previous review but probably wasn't explicit enough. The LocalPersistenceService implementation cannot be this dogmatic about what the unclean state of a directory means to the consumers of the service. It happens that all the existing OSS users of this service don't tolerate an unclean shutdown. That's not true for all the existing users of this service, and isn't necessarily true for all future OSS users. We'll need to find a way that the LocalPersistenceService can communicate the clean/unclean state of the service to consumers so they can make a decision on a case by case basis as to whether their files need to be deleted.

}

if (!clean.exists()) {
LOGGER.warn("Probably unclean shutdown was done. Please take appropriate action, if needed.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who do you expect to see this log message, and what do you expect them to do?

@jitendra-nalwaya
Copy link
Author

jitendra-nalwaya commented Feb 16, 2023

@chrisdennis
Introduce new API isClean() to identify state(normal/abnormal) of stopped service, please review.

* Can take appropriate action by identifying state of service stopped.
*/
@Override
public final boolean isClean() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Services are started in dependency order. The return of this method is only correct before the service is started. Once started the clean file is deleted and the state will look dirty. This means any service which depends on the LocalPersistenceService will check the state of the service on start and always see it as dirty.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisdennis
This scenario is addressed in isClean() API. Please review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't make sense to me. As a consumer of this service I should only be calling methods on the service after it's been started (during my own start call). Given that assumption there are two possible states: clean and unclean, represented by a true return and a false return. If the filesystem was not clean on start, but we started (and locked it) where is the false return path that we'll follow in the code below?

* Can take appropriate action by identifying state of service stopped.
*/
@Override
public final boolean isClean() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't make sense to me. As a consumer of this service I should only be calling methods on the service after it's been started (during my own start call). Given that assumption there are two possible states: clean and unclean, represented by a true return and a false return. If the filesystem was not clean on start, but we started (and locked it) where is the false return path that we'll follow in the code below?

public DefaultLocalPersistenceService(final DefaultPersistenceConfiguration persistenceConfiguration) {
if(persistenceConfiguration != null) {
rootDirectory = persistenceConfiguration.getRootDirectory();
isRootDirectoryExists = rootDirectory.exists();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really can't be reading state in the constructor. You don't have exclusive access to the directory, and arbitrary amounts of time can elapse (and arbitrary state changes can occur) between this constructor call and the service being started. Consider for example if I create two cache managers, pointing at the same directory, but then initialize the first one, mutate the disk, and then shut it down uncleanly, and then start the second one?

private RandomAccessFile rw;
private boolean started;
private final boolean isRootDirectoryExists;
private boolean isCleanFileDeleted;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense to just have a clean variable here that you initialize during start based on what you learn about the state of the disk files. Then your isClean() method can be little more than a return of this value (maybe with some guarding to make sure it's been set correctly).

throw new RuntimeException("Couldn't unlock rootDir: " + rootDirectory.getAbsolutePath(), e);
}

try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to be creating the clean file while you still have the file lock to avoid issues with another cache manager starting up while you are shutting down.


private void internalStart() {
if (!started) {
if (!rootDirectory.exists() || (!cleanFile.exists() && !lockFile.exists())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm think that second condition should really be that the root directory is empty.

internalStart();
}

private void internalStart() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should always positively set the clean variable since services can be stopped and restarted (although it's not common).

clean = true;
LOGGER.debug("clean file is deleted.");
} catch (IOException e) {
LOGGER.warn("clean file was not deleted {}.", cleanFile.getPath());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this should be a hard failure. If the clean file isn't deleted here then a subsequent dirty start would be seen as clean.

if (started) {
return clean;
}
if (rootDirectory.exists()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nobody should be calling anything on a stopped service (other than start). This should be able to be:

if (started) {
  return clean;
} else {
  throw new IllegalStateException(...);
}

}

started = false;
clean = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably works... but it's a little counterintuitive. I'd rather see this method not set clean at all and positively set clean on both clean and dirty paths in internalStart().

public synchronized void stop() {
if (started) {
try {
if (cleanFile.createNewFile()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably log some kind of warning on a false return here too.


static boolean isDirectoryEmpty(Path path) throws IOException {
if (java.nio.file.Files.isDirectory(path)) {
try (Stream<Path> entries = java.nio.file.Files.list(path)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're making this a generic utility method we should probably not gate on isDirectory(...) and return false for a file since a file is not an empty directory. Probably better to just:

static boolean isDirectoryEmpty(Path path) throws IOException {
  try (Stream<Path> entries = java.nio.file.Files.list(path)) {
    return !entries.findFirst().isPresent();
  } catch (UncheckedIOException e) {
    throw e.getCause();
  }
}

if (cleanFile.createNewFile()) {
LOGGER.debug("clean file is created.");
} else {
LOGGER.warn("clean file already exists.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably replace this with a more informative message. Since we're logging this at warn level what should a user do with the information? What's gone/going wrong, and what should they do to fix it?

LOGGER.warn("clean file already exists.");
}
} catch (IOException e) {
LOGGER.warn("clean file is not created.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same deal here... what does this mean for a user. They don't know what a clean file is... they care about the implications of this message for their usage of the product.

Copy link
Member

@chrisdennis chrisdennis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See new comments. We'll also need to make the necessary changes in the consumers of the LocalPersistenceService so that they can react appropriately to the clean/unclean state. We'll also need some amount of test coverage to prove this is all working.

@jitendra-nalwaya jitendra-nalwaya removed the request for review from kumarshilp April 21, 2023 05:51
* Identify status of service.
* @return <tt>true</tt> if service is started.
*/
boolean isServiceStarted();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method isn't/shouldn't be needed.

* Identify state(normal/abnormal) of stopped service.
* @return <tt>true</tt> if service stopped normally.
*/
boolean isClean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc needs polish, and the @throws should be moved up to here:

  /**
   * Return the cleanliness of the state stored in this service.
   * <p>
   * Stored state is assumed to be clean if the service detects
   * that the last started instantiation of this service was shutdown
   * successfully.
   *
   * @return {@code true} if the state is clean
   * @throws IllegalStateException if the service is not started
   */

// Service stopped unexpectedly means directory exists with some data but without .clean file.
File f = folder.newFolder("testServiceStoppedStatusWhenStoppedUnexpectedly");
final DefaultLocalPersistenceService service = new DefaultLocalPersistenceService(new DefaultPersistenceConfiguration(f));
new File(f.getAbsolutePath()+"\\dummy.txt").createNewFile();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work correctly on any platform except Windows. Outside of Windows you'll just get a file named <blahblahblah>\dummy.txt because \ is not the right separator. You should use the File(File, String) constructor instead.

try {
File f = folder.newFolder("testServiceStoppedStatusIfServiceIsNotRunning");
final DefaultLocalPersistenceService service = new DefaultLocalPersistenceService(new DefaultPersistenceConfiguration(f));
service.isClean();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the method call being tested should be inside the try-catch block.

public void testServiceStoppedStatusWhenStoppedCleanly() throws IOException {
File f = folder.newFolder("testServiceStoppedStatusWhenStoppedCleanly");
final DefaultLocalPersistenceService service = new DefaultLocalPersistenceService(new DefaultPersistenceConfiguration(f));
service.start(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should actually create a file inside the directory to ensure we hit the non-empty directory codepath.

if (cleanFile.createNewFile()) {
LOGGER.debug("clean file is created.");
} else {
LOGGER.warn("clean file already exists. It's not deleted either user's permission or network issue." +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logging is verbose but doesn't really help the user much. What will happen as a result of this error/problem? What should the user do or expect?

* {@inheritDoc}
*/
@Override
public final boolean isServiceStarted() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can go away.

}
}

/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This javadoc isn't needed once we fix up the interface method javadoc.

"\n Hint: clean file exists on service startup, indicates service was stopped cleanly last time. It gets created while the service is stopped and it should be deleted while the service is started.");
}
} catch (IOException e) {
LOGGER.warn("clean file is not created. Reason: " + e.getMessage() +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log line suffers the same problem. It's long, wordy, but doesn't help the user.

private void innerStart(ServiceProvider<? super MaintainableService> serviceProvider) {
persistenceService = serviceProvider.getService(LocalPersistenceService.class);
isStarted = true;
if (persistenceService!=null && persistenceService.isServiceStarted() && !persistenceService.isClean()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The persistenceService is guaranteed by the contract of the service mechanism to be both non-null and started by the time this method is called.

@jitendra-nalwaya
Copy link
Author

@chrisdennis
I addressed review comments. Please review.

}

/**
* {@inheritDoc}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to keep an {@inheritDoc} around... javadoc will get inherited anyway.

@Override
public void start(final ServiceProvider<Service> serviceProvider) {
innerStart(serviceProvider);
if (persistenceService != null && !persistenceService.isClean()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to null-check here. Persistence service should always be non-null because we depend on it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisdennis
Seven existing test cases are getting failed in DefaultDiskResourceServiceTest.java ("WithoutPersistenceService") if not handling the null check. Please share your thoughts on this. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to look at the @ServiceDependency annotation that is applied to this class, and think about how that annotation is interpreted by the ServiceLocator. Then note that the tests you're looking at are not using the ServiceLocator to provide the services to the DefaultDiskResourceService.

Copy link
Member

@chrisdennis chrisdennis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected this to consist of only two commits (after being rebased against latest HEAD of main). First commit would be the removal of the null checks around the local persistence service, and the removal of the related testing. The second would then be the addition of the new code for the safe shutdown detection. This may just be a rebase away from that state, but it's very hard to see that given the current commit stream.

@jitendra-nalwaya
Copy link
Author

@chrisdennis
New pull request is created with 2 commits - #3161

@jitendra-nalwaya
Copy link
Author

Enhancement #2954 implemented by #3161

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants