Skip to content

Conversation

@janvanmansum
Copy link

@janvanmansum janvanmansum commented Feb 24, 2018

Please ensure you have completed the following before submitting:

  • Ran all tests to ensure existing functionality wasn't broken
  • Ran all quality assurance checks and fixed any new errors or warnings, which include:

Note: you can complete both boxes by running and fixing warnings/errors with gradle clean check

  • Code is self documenting or a short comment when self documenting isn't possible

To reproduce the bug

  1. Clone https://github.com/janvanmansum/bagit-java-bug
  2. Run mvn install
  3. Run mvn exec:java

Expected output is at least a couple of times the word FAIL and then DONE.

To check this PR

  1. Temporarily change project.version = "5.0.0-${now}-SNAPSHOT" to project.version = "5.0.0-SNAPSHOT" in bagit.gradle
  2. gradle build
  3. gradle publishToMavenLocal
  4. In the bagit-java-bug project's pom.xml change the dependency on bagit to 5.0.0-SNAPSHOT and run mvn exec:java again.

Expected out is no FAIL, just the word DONE.

Explanation

The problem is in the existing code in `BagVerifier:

 void checkHashes(final Manifest manifest) throws CorruptChecksumException, InterruptedException, VerificationException{
    final CountDownLatch latch = new CountDownLatch( manifest.getFileToChecksumMap().size());
    
    //TODO maybe return all of these at some point... 
    //if that is ever the case make sure to use Collections.synchronizedCollection(new ArrayList<>())
    //we aren't doing it now because it is a huge performance hit for little value
    final List<Exception> exceptions = new ArrayList<>(); 
    
    for(final Entry<Path, String> entry : manifest.getFileToChecksumMap().entrySet()){
      executor.execute(new CheckManifestHashesTask(entry, manifest.getAlgorithm().getMessageDigestName(), latch, exceptions));
    }
    
    latch.await();
    
    if(!exceptions.isEmpty()){
      final Exception e = exceptions.get(0);
      if(e instanceof CorruptChecksumException){
        logger.debug(messages.getString("checksums_not_matching_error"), exceptions.size());
        throw (CorruptChecksumException)e;
      }
      
      throw new VerificationException(e);
    }
  }

The ArrayList exceptions is shared between all the threads working on the verification. However, this class is not thread safe. The javadocs specifically warn against structural modifications of the ArrayList from multiple threads. In CheckManifestHashesTask this is exactly what is happening as exceptions are added to the list by multiple threads.

This seems to lead to a null entry in the list occasionally, which leads to e being null, thus e instanceof CorruptChecksumException evaluating to false and then to the VerificationException.

The fix, instead of making the list synchronized, gives every CheckManifestHashesTask its own exceptions list and only combines them after the tasks are finished. (Btw, I realize now that this is not really necessary, as the subsequent code just throws on the first exception, so you might want to optimized this a bit, or - even better - return all the corrupt checksums. I'll leave that up to you.)

.settings/
.classpath
.gradle/
.DS_Store
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related, but a nice to have for anyone contributing who uses IntelliJ.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 96.925% when pulling 5e28917 on janvanmansum:CONCURRENCY_PROBLEM into 39654ff on LibraryOfCongress:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 96.925% when pulling 5e28917 on janvanmansum:CONCURRENCY_PROBLEM into 39654ff on LibraryOfCongress:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 96.925% when pulling 5e28917 on janvanmansum:CONCURRENCY_PROBLEM into 39654ff on LibraryOfCongress:master.

@johnscancella
Copy link
Contributor

Hi @janvanmansum thanks for submitting this. I have an error when I try to run your example project

$ mvn exec:java
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=256m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building verifier-bug-bagit 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ verifier-bug-bagit ---
[WARNING]
java.lang.ClassNotFoundException: nl.knaw.dans.Test
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:270)
	at java.lang.Thread.run(Thread.java:748)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.684 s
[INFO] Finished at: 2018-02-26T07:56:43-05:00
[INFO] Final Memory: 7M/155M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.6.0:java (default-cli) on project verifier-bug-bagit: An exception occured while executing the Java class. nl.knaw.dans.Test -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

Also, what is the performance of this code versus using Collections.synchronizedCollection(new ArrayList<>())?

@johnscancella
Copy link
Contributor

I did a small test and using Collections.synchronizedCollection(new ArrayList<>()) fixes the problem. How much more helpful would it be to return a list of all the exceptions instead of just the first one?

@janvanmansum
Copy link
Author

janvanmansum commented Feb 28, 2018

Hi @johnscancella, sorry I forgot to mention that mvn exec:java only works after building the project. I have included the missing step in my description above.

The synchronized ArrayList will also work. However, this will probably lead to some performance loss. There was a comment in the code warning about the perfomance hit. I don't know how bad it will get in practise, but I guess you are using multiple threads to increase performance, so I would probably still prefer to avoid writing to shared data.

In my opinion, getting all the errors instead of just the first one would be a great benefit, especially if they are returned in a structured way, like a list. The library user can stil choose to ignore the other errors anyway, so I don't really see a downside. On the other hand, if there are multiple errors, it is an extra burden on the user to have to discover them one by one.

@johnscancella
Copy link
Contributor

I am ok with a slight performance loss if it prevents an error that you are seeing, that note was from another issue that wasn't causing an error. Basically is is smaller than doing two loops which is what your pull request is doing. In this case it is the difference of O(n+c) vs O(n^2). Using a built in JDK class over custom code is also preferred because it is less of a maintenance burden(i.e. we will never have the same resources that the makers of the JDK do).

I released a new version with the synchronized update 5.1.1. Please let me know if you find any issues. I will add a issue for improving the verifier by returning all the errors.

@janvanmansum janvanmansum deleted the CONCURRENCY_PROBLEM branch February 28, 2018 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants