Use in-memory cache with JPEGCodec by petebankhead · Pull Request #19 · ome/ome-codecs

petebankhead · 2021-12-12T09:55:51Z

This PR aims to improve JPEG compression/decompression performance, particularly with TIFF images.

The original JPEGCodec passes an InputStream or OutputStream to ImageIO.
Strangely, ImageIO in OpenJDK seems to then create a temporary file from this (regardless of whether caching is requested or not) before using FileImageInputStream, which can be a significant performance issue e.g. with whole slide images.

This can be avoided by explicitly creating a MemoryCacheImageInputStream or the corresponding output stream.

The PR contains two other minor modifications that helped further reduce runtime in my tests:

set an initial capacity for the ByteArrayOutputStream
rearrange the code when creating an interleaved pixel array

Overall, I saw conversion time for a JPEG-compressed whole slide image to a JPEG-compressed ome.tiff (e.g. CMU-1.svs) reduce by up to 50%, by speeding up both reading and writing.

Risks: There is a slightly enigmatic warning in the ImageIO docs

In general, it is preferable to use a FileCacheImageInputStream when reading from a regular InputStream. This class is provided for cases where it is not possible to create a writable temporary file.

However, ImageIO's TIFF reader uses MemoryCacheImageInputStream.

This avoids the performance penalty of FileImageInputStream and FileImageOutputStream, which are otherwise used by ImageIO.read and ImageIO.write.

Set initial size of ByteArrayOutputStream to improve write performance within ImageIO. Switch order in which arrays are accessed for pixel interleaving to improve efficiency. Check ImageIO is able to provide a "jpeg" reader, rather than calling .next() on iterator without checking.

sbesson · 2021-12-12T10:51:16Z

Thanks @petebankhead, I have included your PR in the daily CI builds to provide feedback on the impact of this PR. Note that due to https://forum.image.sc/t/upcoming-unavailability-of-ome-resources-dec-10-dec-13/60669 it might take a few days to our systems to get back to a stable state.

In the meantime, is it possible for you to sign and return a CLA as per the instructions here so that this can get merged once it passes Q&A?

petebankhead · 2021-12-12T11:23:22Z

Thanks, I've sent the CLA now & added more info on the performance impact for whole slide image conversion at ome/bioformats#3754

sbesson · 2021-12-13T13:39:26Z

@petebankhead it looks like the inclusion of this PR in the coupled Bio-Formats build failed a downstream unit tests - see https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-image/1014/console

[�[1;31mERROR�[m] testALT_JPEG(loci.formats.utests.tiff.TiffCompressionDecompressTest)  Time elapsed: 0.011 s  <<< FAILURE!
org.testng.TestException: 

Expected exception java.lang.NullPointerException but got loci.formats.FormatException: java.io.EOFException: Attempting to read beyond end of file.
	at loci.formats.utests.tiff.TiffCompressionDecompressTest.testALT_JPEG(TiffCompressionDecompressTest.java:167)
Caused by: java.io.EOFException: Attempting to read beyond end of file.
	at loci.formats.utests.tiff.TiffCompressionDecompressTest.testALT_JPEG(TiffCompressionDecompressTest.java:167)

[�[1;31mERROR�[m] testJPEG(loci.formats.utests.tiff.TiffCompressionDecompressTest)  Time elapsed: 0 s  <<< FAILURE!
org.testng.TestException: 

Expected exception java.lang.NullPointerException but got loci.formats.FormatException: java.io.EOFException: Attempting to read beyond end of file.
	at loci.formats.utests.tiff.TiffCompressionDecompressTest.testJPEG(TiffCompressionDecompressTest.java:108)
Caused by: java.io.EOFException: Attempting to read beyond end of file.
	at loci.formats.utests.tiff.TiffCompressionDecompressTest.testJPEG(TiffCompressionDecompressTest.java:108)

After removing the include label, the same build completed without failure

petebankhead · 2021-12-13T14:52:02Z

Thanks @sbesson do I understand correctly that it fails here because it throws an EOFException rather than a NullPointerException (but either way it's expected to throw an exception because the JPEG isn't valid)?

It appears that ImageIO sometimes returns null if there is a failure, and silently swallows the reason for the failure.

In this case, the previous code left ImageIO to figure out that it had a JPEG stream, and because it got a byte array of zeros it couldn't find a reader and so returned null. It then proceeded to request width of the null image here and throw a NullPointerException when that failed. The test looks like this was the expected behavior.

With the original PR, ImageIO threw a more informative IOException (Not a JPEG file: starts with 0x00 0x00) - but then this resulted in the assumption that LosslessJPEGCodec should be used instead. In the end, LosslessJPEGCodec threw the EOFException that caused the test to fail.

I've updated the PR to more closely match the previous behavior, while making the detail message with the NPE more informative - I hope that's enough to pass the tests.

However, I'm not sure that it is the right exception to throw. Additionally, it looks like the previous implementation of JPEGCodec.decompress would have worked for any ImageIO-compatible stream, even if it wasn't a JPEG. I'm not sure if a forgiving JPEGCodec that can also decompress PNGs is really desirable or not. Explicitly requesting a JPEG reader from ImageIO avoids this.

What do you think?

For consistency with the previous behavior of JPEGCodec and test in TiffCompressionDecompressTest

sbesson · 2021-12-14T20:59:12Z

Thanks for the answer @petebankhead. I had not looked into the details of this exception as I have been focusing on getting our builds passing over the last few days and my original comment was primarily meant to provide feedback on the downstream rule.

I am not the most knowledgeable in terms of implementation or even codes but I concur with your last statement in #19 (comment) re the API: aNullPointerException should never be explicitly thrown as part of the contract and having downstream clients relying on this type of exception feels error-prone. From https://www.javadoc.io/doc/org.openmicroscopy/ome-codecs/latest/ome/codecs/Codec.html, my initial expectation is that a CodecException should be thrown with the appropriate error message under such circumstances. /cc @ome/formats for additional thoughts although we might come back to this in January 2022.

Once we settle on the API decision, I would be inclined to use the occasion and add some local coverage via unit tests in this repository rather than relying on the tests of the downstream formats-bsd component. This is unrelated to the feature introduced here and is the outcome of the former component decoupling. Initially, I assume the relevant bits of TiffCompressionDecompressTest.java could be extracted into a local JPEGCodecTest.java

melissalinkert

Adding include label to see how this impacts the next merge build.

melissalinkert · 2022-11-08T18:42:14Z

    }

-    ByteArrayOutputStream out = new ByteArrayOutputStream();
+    ByteArrayOutputStream out = new ByteArrayOutputStream(Math.max(1024, data.length / 4));


Might be worth limiting this a bit further - in the worst case where data.length is Integer.MAX_VALUE, this would set the output stream's length to 536870911.

dgault · 2022-11-09T17:10:46Z

Temporarily excluding due to failures in https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/lastFailedBuild/consoleFull. It looks as though empty arrays are returned when decoding some tiles. Though it is odd that it is specifically the DICOM format that fails, this will require further investigation and debugging.

melissalinkert · 2022-11-11T00:39:22Z

    }
    catch (IOException exc) {
-      // probably a lossless JPEG; delegate to LosslessJPEGCodec
-      in.seek(fp);


The removal of this seek seems to be what causes tests to fail. The tests that failed in https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/213045/console pass locally if I reintroduce in.seek(fp) here.

Added this back now

melissalinkert · 2023-02-06T15:46:08Z

@petebankhead : have you had a chance to look at the proposed fix in #19 (review)? We'd still like to get these changes in, but if you don't have time to work on this right now just let us know.

@melissalinkert

Based on review from @melissalinkert Rationale for `int maxByteArrayLength = Integer.MAX_VALUE / 8;` assumes a typical JPEG compression ratio of 10:1, with some extra tolerance since resizing very long byte arrays might be expensive.

petebankhead · 2023-02-06T18:09:31Z

@melissalinkert sorry, didn't realise this was waiting on me. Made the changes now.

Rationale for int maxByteArrayLength = Integer.MAX_VALUE / 8; was that I assumed (from wikipedia...) a typical JPEG compression ratio of 10:1, with some extra tolerance in case resizing very long byte arrays could be expensive. Would still be ~256 MB but hopefully this wouldn't occur often, and if it does then I'd guess that the JPEG itself would need to be large anyway.

melissalinkert · 2023-02-06T21:44:28Z

Thanks @petebankhead. Re-including in builds to double-check that tests pass.

dgault

Below are some further test results taken from converting datasets to TIFF, OME-TIFF and DICOM (when suitable) with JPEG compression, each run 3 times and averaged. After conversion all files could be opened and displayed as expected.

Input	Output Format	6.12.0	With this PR
OME-TIFF/2016-06/tubhiswt-4D	TIFF	308.042	36.037
OME-TIFF/2016-06/tubhiswt-4D	OME-TIFF	288.245	35.633
OME-TIFF/2016-06/MitoCheck	TIFF	55.672	13.769
OME-TIFF/2016-06/MitoCheck	OME-TIFF	56.393	14.436
OME-TIFF/2016-06/sub-resolutions/Fluorescence	TIFF	248.676	214.107
OME-TIFF/2016-06/sub-resolutions/Fluorescence	OME-TIFF	252.77	214.513
OME-TIFF/2016-06/sub-resolutions/Fluorescence	DICOM	156.045	154.524
DICOM/samples/OT-MONO2-8-hip	TIFF	1.373	1.276
DICOM/samples/OT-MONO2-8-hip	OME-TIFF	1.478	1.448
DICOM/samples/OT-MONO2-8-hip	DICOM	1.235	1.233
ND2/aryeh/Time sequence 24	TIFF	160.39	47.867
ND2/aryeh/Time sequence 24	OME-TIFF	133.563	50.998
curated/zeiss-czi/nicholas-trahearn	TIFF	1148.929	1003.436
curated/zeiss-czi/nicholas-trahearn	OME-TIFF	925.065	929.476
SVS/77917	TIFF	7839.688	2683.528
SVS/77917	OME-TIFF	7463.857	2653.187
Leica-LIF/michael	TIFF	3.482	2.149
Leica-LIF/michael	OME-TIFF	3.716	2.243
Olympus-OIR/etienne/Clipboard-venus stack.	TIFF	3.482	1.807
Olympus-OIR/etienne/Clipboard-venus stack.	OME-TIFF	3.212	1.782
Olympus-OIR/etienne/Clipboard-venus stack.	DICOM	1.273	1.235

Overall the results look good with some significant improvements and no noteworthy decreases. Overall the PR looks good from my perspective.

petebankhead added 2 commits December 10, 2021 17:30

Use MemoryCacheImageStreams with JPEGCodec

680df74

This avoids the performance penalty of FileImageInputStream and FileImageOutputStream, which are otherwise used by ImageIO.read and ImageIO.write.

sbesson closed this Dec 12, 2021

sbesson reopened this Dec 12, 2021

sbesson added the include label Dec 12, 2021

petebankhead mentioned this pull request Dec 12, 2021

Tiff performance ome/bioformats#3754

Merged

sbesson removed the include label Dec 13, 2021

Throw NPE if the JPEG stream cannot be read

f7ec0a2

For consistency with the previous behavior of JPEGCodec and test in TiffCompressionDecompressTest

sbesson requested review from melissalinkert and sbesson November 7, 2022 14:34

melissalinkert reviewed Nov 8, 2022

View reviewed changes

melissalinkert added the include label Nov 8, 2022

dgault added exclude and removed include labels Nov 9, 2022

sbesson removed their request for review November 9, 2022 17:26

melissalinkert reviewed Nov 11, 2022

View reviewed changes

melissalinkert self-assigned this Feb 6, 2023

Fix byte array initial size, lossless jpeg call

9a3a0e3

Based on review from @melissalinkert Rationale for `int maxByteArrayLength = Integer.MAX_VALUE / 8;` assumes a typical JPEG compression ratio of 10:1, with some extra tolerance since resizing very long byte arrays might be expensive.

melissalinkert added include and removed exclude labels Feb 6, 2023

melissalinkert approved these changes Feb 9, 2023

View reviewed changes

snoopycrimecop mentioned this pull request Feb 10, 2023

Setting jpeg quality in JPEGCodec #26

Merged

sbesson requested a review from dgault March 6, 2023 14:29

dgault approved these changes Mar 10, 2023

View reviewed changes

dgault merged commit 2666ade into ome:master Mar 10, 2023

petebankhead deleted the performance branch March 10, 2023 19:36

Conversation

petebankhead commented Dec 12, 2021

Uh oh!

sbesson commented Dec 12, 2021

Uh oh!

petebankhead commented Dec 12, 2021

Uh oh!

sbesson commented Dec 13, 2021

Uh oh!

petebankhead commented Dec 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sbesson commented Dec 14, 2021

Uh oh!

melissalinkert left a comment

Choose a reason for hiding this comment

Uh oh!

melissalinkert Nov 8, 2022

Choose a reason for hiding this comment

Uh oh!

dgault commented Nov 9, 2022

Uh oh!

melissalinkert Nov 11, 2022

Choose a reason for hiding this comment

Uh oh!

petebankhead Feb 6, 2023

Choose a reason for hiding this comment

Uh oh!

melissalinkert commented Feb 6, 2023

Uh oh!

petebankhead commented Feb 6, 2023

Uh oh!

melissalinkert commented Feb 6, 2023

Uh oh!

dgault left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

petebankhead commented Dec 13, 2021 •

edited

Loading