Use in-memory cache with JPEGCodec#19
Use in-memory cache with JPEGCodec#19dgault merged 4 commits intoome:masterfrom petebankhead:performance
Conversation
This avoids the performance penalty of FileImageInputStream and FileImageOutputStream, which are otherwise used by ImageIO.read and ImageIO.write.
Set initial size of ByteArrayOutputStream to improve write performance within ImageIO. Switch order in which arrays are accessed for pixel interleaving to improve efficiency. Check ImageIO is able to provide a "jpeg" reader, rather than calling .next() on iterator without checking.
|
Thanks @petebankhead, I have included your PR in the daily CI builds to provide feedback on the impact of this PR. Note that due to https://forum.image.sc/t/upcoming-unavailability-of-ome-resources-dec-10-dec-13/60669 it might take a few days to our systems to get back to a stable state. In the meantime, is it possible for you to sign and return a CLA as per the instructions here so that this can get merged once it passes Q&A? |
|
Thanks, I've sent the CLA now & added more info on the performance impact for whole slide image conversion at ome/bioformats#3754 |
|
@petebankhead it looks like the inclusion of this PR in the coupled Bio-Formats build failed a downstream unit tests - see https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-image/1014/console After removing the include label, the same build completed without failure |
|
Thanks @sbesson do I understand correctly that it fails here because it throws an It appears that ImageIO sometimes returns null if there is a failure, and silently swallows the reason for the failure. In this case, the previous code left ImageIO to figure out that it had a JPEG stream, and because it got a byte array of zeros it couldn't find a reader and so returned null. It then proceeded to request width of the null image here and throw a With the original PR, ImageIO threw a more informative IOException ( I've updated the PR to more closely match the previous behavior, while making the detail message with the NPE more informative - I hope that's enough to pass the tests. However, I'm not sure that it is the right exception to throw. Additionally, it looks like the previous implementation of What do you think? |
For consistency with the previous behavior of JPEGCodec and test in TiffCompressionDecompressTest
|
Thanks for the answer @petebankhead. I had not looked into the details of this exception as I have been focusing on getting our builds passing over the last few days and my original comment was primarily meant to provide feedback on the downstream rule. I am not the most knowledgeable in terms of implementation or even codes but I concur with your last statement in #19 (comment) re the API: a Once we settle on the API decision, I would be inclined to use the occasion and add some local coverage via unit tests in this repository rather than relying on the tests of the downstream |
melissalinkert
left a comment
There was a problem hiding this comment.
Adding include label to see how this impacts the next merge build.
| } | ||
|
|
||
| ByteArrayOutputStream out = new ByteArrayOutputStream(); | ||
| ByteArrayOutputStream out = new ByteArrayOutputStream(Math.max(1024, data.length / 4)); |
There was a problem hiding this comment.
Might be worth limiting this a bit further - in the worst case where data.length is Integer.MAX_VALUE, this would set the output stream's length to 536870911.
|
Temporarily excluding due to failures in https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/lastFailedBuild/consoleFull. It looks as though empty arrays are returned when decoding some tiles. Though it is odd that it is specifically the DICOM format that fails, this will require further investigation and debugging. |
| } | ||
| catch (IOException exc) { | ||
| // probably a lossless JPEG; delegate to LosslessJPEGCodec | ||
| in.seek(fp); |
There was a problem hiding this comment.
The removal of this seek seems to be what causes tests to fail. The tests that failed in https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-folder/213045/console pass locally if I reintroduce in.seek(fp) here.
There was a problem hiding this comment.
Added this back now
|
@petebankhead : have you had a chance to look at the proposed fix in #19 (review)? We'd still like to get these changes in, but if you don't have time to work on this right now just let us know. |
Based on review from @melissalinkert Rationale for `int maxByteArrayLength = Integer.MAX_VALUE / 8;` assumes a typical JPEG compression ratio of 10:1, with some extra tolerance since resizing very long byte arrays might be expensive.
|
@melissalinkert sorry, didn't realise this was waiting on me. Made the changes now. Rationale for |
|
Thanks @petebankhead. Re-including in builds to double-check that tests pass. |
dgault
left a comment
There was a problem hiding this comment.
Below are some further test results taken from converting datasets to TIFF, OME-TIFF and DICOM (when suitable) with JPEG compression, each run 3 times and averaged. After conversion all files could be opened and displayed as expected.
| Input | Output Format | 6.12.0 | With this PR |
|---|---|---|---|
| OME-TIFF/2016-06/tubhiswt-4D | TIFF | 308.042 | 36.037 |
| OME-TIFF/2016-06/tubhiswt-4D | OME-TIFF | 288.245 | 35.633 |
| OME-TIFF/2016-06/MitoCheck | TIFF | 55.672 | 13.769 |
| OME-TIFF/2016-06/MitoCheck | OME-TIFF | 56.393 | 14.436 |
| OME-TIFF/2016-06/sub-resolutions/Fluorescence | TIFF | 248.676 | 214.107 |
| OME-TIFF/2016-06/sub-resolutions/Fluorescence | OME-TIFF | 252.77 | 214.513 |
| OME-TIFF/2016-06/sub-resolutions/Fluorescence | DICOM | 156.045 | 154.524 |
| DICOM/samples/OT-MONO2-8-hip | TIFF | 1.373 | 1.276 |
| DICOM/samples/OT-MONO2-8-hip | OME-TIFF | 1.478 | 1.448 |
| DICOM/samples/OT-MONO2-8-hip | DICOM | 1.235 | 1.233 |
| ND2/aryeh/Time sequence 24 | TIFF | 160.39 | 47.867 |
| ND2/aryeh/Time sequence 24 | OME-TIFF | 133.563 | 50.998 |
| curated/zeiss-czi/nicholas-trahearn | TIFF | 1148.929 | 1003.436 |
| curated/zeiss-czi/nicholas-trahearn | OME-TIFF | 925.065 | 929.476 |
| SVS/77917 | TIFF | 7839.688 | 2683.528 |
| SVS/77917 | OME-TIFF | 7463.857 | 2653.187 |
| Leica-LIF/michael | TIFF | 3.482 | 2.149 |
| Leica-LIF/michael | OME-TIFF | 3.716 | 2.243 |
| Olympus-OIR/etienne/Clipboard-venus stack. | TIFF | 3.482 | 1.807 |
| Olympus-OIR/etienne/Clipboard-venus stack. | OME-TIFF | 3.212 | 1.782 |
| Olympus-OIR/etienne/Clipboard-venus stack. | DICOM | 1.273 | 1.235 |
Overall the results look good with some significant improvements and no noteworthy decreases. Overall the PR looks good from my perspective.
This PR aims to improve JPEG compression/decompression performance, particularly with TIFF images.
The original JPEGCodec passes an
InputStreamorOutputStreamto ImageIO.Strangely, ImageIO in OpenJDK seems to then create a temporary file from this (regardless of whether caching is requested or not) before using
FileImageInputStream, which can be a significant performance issue e.g. with whole slide images.This can be avoided by explicitly creating a
MemoryCacheImageInputStreamor the corresponding output stream.The PR contains two other minor modifications that helped further reduce runtime in my tests:
ByteArrayOutputStreamOverall, I saw conversion time for a JPEG-compressed whole slide image to a JPEG-compressed ome.tiff (e.g. CMU-1.svs) reduce by up to 50%, by speeding up both reading and writing.
Risks: There is a slightly enigmatic warning in the ImageIO docs
However, ImageIO's TIFF reader uses
MemoryCacheImageInputStream.