Skip to content

Use in-memory cache with JPEGCodec#19

Merged
dgault merged 4 commits intoome:masterfrom
petebankhead:performance
Mar 10, 2023
Merged

Use in-memory cache with JPEGCodec#19
dgault merged 4 commits intoome:masterfrom
petebankhead:performance

Conversation

@petebankhead
Copy link
Copy Markdown
Contributor

This PR aims to improve JPEG compression/decompression performance, particularly with TIFF images.

The original JPEGCodec passes an InputStream or OutputStream to ImageIO.
Strangely, ImageIO in OpenJDK seems to then create a temporary file from this (regardless of whether caching is requested or not) before using FileImageInputStream, which can be a significant performance issue e.g. with whole slide images.

This can be avoided by explicitly creating a MemoryCacheImageInputStream or the corresponding output stream.

The PR contains two other minor modifications that helped further reduce runtime in my tests:

  • set an initial capacity for the ByteArrayOutputStream
  • rearrange the code when creating an interleaved pixel array

Overall, I saw conversion time for a JPEG-compressed whole slide image to a JPEG-compressed ome.tiff (e.g. CMU-1.svs) reduce by up to 50%, by speeding up both reading and writing.


Risks: There is a slightly enigmatic warning in the ImageIO docs

In general, it is preferable to use a FileCacheImageInputStream when reading from a regular InputStream. This class is provided for cases where it is not possible to create a writable temporary file.

However, ImageIO's TIFF reader uses MemoryCacheImageInputStream.

This avoids the performance penalty of FileImageInputStream and FileImageOutputStream, which are otherwise used by ImageIO.read and ImageIO.write.
Set initial size of ByteArrayOutputStream to improve write performance within ImageIO.
Switch order in which arrays are accessed for pixel interleaving to improve efficiency.
Check ImageIO is able to provide a "jpeg" reader, rather than calling .next() on iterator without checking.
@sbesson sbesson closed this Dec 12, 2021
@sbesson sbesson reopened this Dec 12, 2021
@sbesson
Copy link
Copy Markdown
Member

sbesson commented Dec 12, 2021

Thanks @petebankhead, I have included your PR in the daily CI builds to provide feedback on the impact of this PR. Note that due to https://forum.image.sc/t/upcoming-unavailability-of-ome-resources-dec-10-dec-13/60669 it might take a few days to our systems to get back to a stable state.

In the meantime, is it possible for you to sign and return a CLA as per the instructions here so that this can get merged once it passes Q&A?

@petebankhead
Copy link
Copy Markdown
Contributor Author

Thanks, I've sent the CLA now & added more info on the performance impact for whole slide image conversion at ome/bioformats#3754

@sbesson sbesson removed the include label Dec 13, 2021
@sbesson
Copy link
Copy Markdown
Member

sbesson commented Dec 13, 2021

@petebankhead it looks like the inclusion of this PR in the coupled Bio-Formats build failed a downstream unit tests - see https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-image/1014/console

[�[1;31mERROR�[m] testALT_JPEG(loci.formats.utests.tiff.TiffCompressionDecompressTest)  Time elapsed: 0.011 s  <<< FAILURE!
org.testng.TestException: 

Expected exception java.lang.NullPointerException but got loci.formats.FormatException: java.io.EOFException: Attempting to read beyond end of file.
	at loci.formats.utests.tiff.TiffCompressionDecompressTest.testALT_JPEG(TiffCompressionDecompressTest.java:167)
Caused by: java.io.EOFException: Attempting to read beyond end of file.
	at loci.formats.utests.tiff.TiffCompressionDecompressTest.testALT_JPEG(TiffCompressionDecompressTest.java:167)

[�[1;31mERROR�[m] testJPEG(loci.formats.utests.tiff.TiffCompressionDecompressTest)  Time elapsed: 0 s  <<< FAILURE!
org.testng.TestException: 

Expected exception java.lang.NullPointerException but got loci.formats.FormatException: java.io.EOFException: Attempting to read beyond end of file.
	at loci.formats.utests.tiff.TiffCompressionDecompressTest.testJPEG(TiffCompressionDecompressTest.java:108)
Caused by: java.io.EOFException: Attempting to read beyond end of file.
	at loci.formats.utests.tiff.TiffCompressionDecompressTest.testJPEG(TiffCompressionDecompressTest.java:108)

After removing the include label, the same build completed without failure

@petebankhead
Copy link
Copy Markdown
Contributor Author

petebankhead commented Dec 13, 2021

Thanks @sbesson do I understand correctly that it fails here because it throws an EOFException rather than a NullPointerException (but either way it's expected to throw an exception because the JPEG isn't valid)?

It appears that ImageIO sometimes returns null if there is a failure, and silently swallows the reason for the failure.

In this case, the previous code left ImageIO to figure out that it had a JPEG stream, and because it got a byte array of zeros it couldn't find a reader and so returned null. It then proceeded to request width of the null image here and throw a NullPointerException when that failed. The test looks like this was the expected behavior.

With the original PR, ImageIO threw a more informative IOException (Not a JPEG file: starts with 0x00 0x00) - but then this resulted in the assumption that LosslessJPEGCodec should be used instead. In the end, LosslessJPEGCodec threw the EOFException that caused the test to fail.

I've updated the PR to more closely match the previous behavior, while making the detail message with the NPE more informative - I hope that's enough to pass the tests.

However, I'm not sure that it is the right exception to throw. Additionally, it looks like the previous implementation of JPEGCodec.decompress would have worked for any ImageIO-compatible stream, even if it wasn't a JPEG. I'm not sure if a forgiving JPEGCodec that can also decompress PNGs is really desirable or not. Explicitly requesting a JPEG reader from ImageIO avoids this.

What do you think?

For consistency with the previous behavior of JPEGCodec and test in TiffCompressionDecompressTest
@sbesson
Copy link
Copy Markdown
Member

sbesson commented Dec 14, 2021

Thanks for the answer @petebankhead. I had not looked into the details of this exception as I have been focusing on getting our builds passing over the last few days and my original comment was primarily meant to provide feedback on the downstream rule.

I am not the most knowledgeable in terms of implementation or even codes but I concur with your last statement in #19 (comment) re the API: aNullPointerException should never be explicitly thrown as part of the contract and having downstream clients relying on this type of exception feels error-prone. From https://www.javadoc.io/doc/org.openmicroscopy/ome-codecs/latest/ome/codecs/Codec.html, my initial expectation is that a CodecException should be thrown with the appropriate error message under such circumstances. /cc @ome/formats for additional thoughts although we might come back to this in January 2022.

Once we settle on the API decision, I would be inclined to use the occasion and add some local coverage via unit tests in this repository rather than relying on the tests of the downstream formats-bsd component. This is unrelated to the feature introduced here and is the outcome of the former component decoupling. Initially, I assume the relevant bits of TiffCompressionDecompressTest.java could be extracted into a local JPEGCodecTest.java

Copy link
Copy Markdown
Member

@melissalinkert melissalinkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding include label to see how this impacts the next merge build.

Comment thread src/main/java/ome/codecs/JPEGCodec.java Outdated
}

ByteArrayOutputStream out = new ByteArrayOutputStream();
ByteArrayOutputStream out = new ByteArrayOutputStream(Math.max(1024, data.length / 4));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth limiting this a bit further - in the worst case where data.length is Integer.MAX_VALUE, this would set the output stream's length to 536870911.

@dgault
Copy link
Copy Markdown
Member

dgault commented Nov 9, 2022

Temporarily excluding due to failures in https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/lastFailedBuild/consoleFull. It looks as though empty arrays are returned when decoding some tiles. Though it is odd that it is specifically the DICOM format that fails, this will require further investigation and debugging.

@dgault dgault added exclude and removed include labels Nov 9, 2022
@sbesson sbesson removed their request for review November 9, 2022 17:26
}
catch (IOException exc) {
// probably a lossless JPEG; delegate to LosslessJPEGCodec
in.seek(fp);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removal of this seek seems to be what causes tests to fail. The tests that failed in https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/213045/console pass locally if I reintroduce in.seek(fp) here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this back now

@melissalinkert melissalinkert self-assigned this Feb 6, 2023
@melissalinkert
Copy link
Copy Markdown
Member

@petebankhead : have you had a chance to look at the proposed fix in #19 (review)? We'd still like to get these changes in, but if you don't have time to work on this right now just let us know.

Based on review from @melissalinkert

Rationale for `int maxByteArrayLength = Integer.MAX_VALUE / 8;` assumes a typical JPEG compression ratio of 10:1, with some extra tolerance since resizing very long byte arrays might be expensive.
@petebankhead
Copy link
Copy Markdown
Contributor Author

@melissalinkert sorry, didn't realise this was waiting on me. Made the changes now.

Rationale for int maxByteArrayLength = Integer.MAX_VALUE / 8; was that I assumed (from wikipedia...) a typical JPEG compression ratio of 10:1, with some extra tolerance in case resizing very long byte arrays could be expensive. Would still be ~256 MB but hopefully this wouldn't occur often, and if it does then I'd guess that the JPEG itself would need to be large anyway.

@melissalinkert
Copy link
Copy Markdown
Member

Thanks @petebankhead. Re-including in builds to double-check that tests pass.

Copy link
Copy Markdown
Member

@dgault dgault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below are some further test results taken from converting datasets to TIFF, OME-TIFF and DICOM (when suitable) with JPEG compression, each run 3 times and averaged. After conversion all files could be opened and displayed as expected.

Input Output Format 6.12.0 With this PR
OME-TIFF/2016-06/tubhiswt-4D TIFF 308.042 36.037
OME-TIFF/2016-06/tubhiswt-4D OME-TIFF 288.245 35.633
OME-TIFF/2016-06/MitoCheck TIFF 55.672 13.769
OME-TIFF/2016-06/MitoCheck OME-TIFF 56.393 14.436
OME-TIFF/2016-06/sub-resolutions/Fluorescence TIFF 248.676 214.107
OME-TIFF/2016-06/sub-resolutions/Fluorescence OME-TIFF 252.77 214.513
OME-TIFF/2016-06/sub-resolutions/Fluorescence DICOM 156.045 154.524
DICOM/samples/OT-MONO2-8-hip TIFF 1.373 1.276
DICOM/samples/OT-MONO2-8-hip OME-TIFF 1.478 1.448
DICOM/samples/OT-MONO2-8-hip DICOM 1.235 1.233
ND2/aryeh/Time sequence 24 TIFF 160.39 47.867
ND2/aryeh/Time sequence 24 OME-TIFF 133.563 50.998
curated/zeiss-czi/nicholas-trahearn TIFF 1148.929 1003.436
curated/zeiss-czi/nicholas-trahearn OME-TIFF 925.065 929.476
SVS/77917 TIFF 7839.688 2683.528
SVS/77917 OME-TIFF 7463.857 2653.187
Leica-LIF/michael TIFF 3.482 2.149
Leica-LIF/michael OME-TIFF 3.716 2.243
Olympus-OIR/etienne/Clipboard-venus stack. TIFF 3.482 1.807
Olympus-OIR/etienne/Clipboard-venus stack. OME-TIFF 3.212 1.782
Olympus-OIR/etienne/Clipboard-venus stack. DICOM 1.273 1.235

Overall the results look good with some significant improvements and no noteworthy decreases. Overall the PR looks good from my perspective.

@dgault dgault merged commit 2666ade into ome:master Mar 10, 2023
@petebankhead petebankhead deleted the performance branch March 10, 2023 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants