Skip to content

IQSS/9506 thumbnail failure tracking and other performance improvements#9669

Merged
pdurbin merged 47 commits intoIQSS:developfrom
QualitativeDataRepository:IQSS/9506-thumbnail-tracking
Dec 5, 2023
Merged

IQSS/9506 thumbnail failure tracking and other performance improvements#9669
pdurbin merged 47 commits intoIQSS:developfrom
QualitativeDataRepository:IQSS/9506-thumbnail-tracking

Conversation

@qqmyers
Copy link
Member

@qqmyers qqmyers commented Jun 21, 2023

What this PR does / why we need it: This PR makes several changes to improve the performance of thumbnail generation and retrieval, covering the issue #9506 (also reported via QDR) and the issue raised by Bikramjit Singh/Borealis when using relatively slow S3 storage (email between Borealis and @scolapasta, @pdurbin and @qqmyers ) and additional issues discovered while investigating:

  • Caches the isThumbnailAvailable response in the ThumbnailServiceWrapper for the dataset file table in edit/view modes
  • Switches to returning a download URL (versus a base64-encoded copy) in the main dataset search display
  • Sets the dataset id in datasets returned in search results to enable the existing caching in the ThumbnailServiceWrapper.dvobjectThumbnailsMap (the lack of an id meant the caching map wasn't being populated)
  • Implements a previewshavefailed previewimagefail flag that is set the first time an attempt to create a thumbnail for a given file fails which is then used to avoid retrying the thumbnail creation process every time a thumbnail is requested (or isThumbnailAvailable() was called). Adds api calls to reset this flag globally or per file (to allow retrying)
  • Switches to using streams (vs channels) in pdf thumb generation for the temp file case since the channel.transferFrom method is not guaranteed to transfer all bytes (and can transfer 0 bytes) whereas the InputStream.transferTo method blocks until all bytes are transferred.
  • Refactors to remove duplicate code
  • Sets the preview available flag false when the attempt to copy temporary previews during Ingest fails

Which issue(s) this PR closes:

Special notes for your reviewer:
This PR doesn't completely remove using a base64 encoded thumb URL, e.g. on the dataset and file page where one base64 image is displayed. Once could also remove it in that case, but the performance issues related to base64 generation are primarily when many images have to be created before a page can be rendered.

This may also be useful for the SPA?

Suggestions on how to test this: Assure that thumbnails appear for image/pdf files as before (regression), that initial page load is faster/DV server is not making multiple S3 calls from the server to render initial root collection page, that files where thumbnail generation fails get marked with the previewshavefailed flag and subsequent accesses don't attempt to recreate the thumbnail, that using the api call to reset the flag for a/all files results in a new attempt to create a thumbnail.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: thumbnails on the main page now load ~asynchronously (versus the page not loading at all until all thumbs are available).

Is there a release notes update needed for this change?:

Additional documentation: admin API docs

@qqmyers qqmyers added the Size: 30 A percentage of a sprint. 21 hours. (formerly size:33) label Jun 21, 2023
@coveralls
Copy link

coveralls commented Jun 21, 2023

Coverage Status

coverage: 20.063% (+0.003%) from 20.06%
when pulling 5149941 on QualitativeDataRepository:IQSS/9506-thumbnail-tracking
into b33fe57 on IQSS:develop.

@bikramj
Copy link
Contributor

bikramj commented Jun 21, 2023

Thank you so much @qqmyers for implementing it, this will solve the slow page load issue for us and anyone using custom S3 endpoints.

Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. thanks for the updates.

@sekmiller sekmiller removed their assignment Nov 28, 2023
@pdurbin pdurbin self-assigned this Nov 30, 2023
qqmyers added a commit to QualitativeDataRepository/dataverse that referenced this pull request Dec 5, 2023
FWIW: QDR generates a 400px version here and then uses styling
 to fit the page. Not sure what the motivation for that was without
 digging.
@qqmyers qqmyers removed their assignment Dec 5, 2023
@pdurbin
Copy link
Member

pdurbin commented Dec 5, 2023

There are merge conflicts and the SQL script needs to be bumped.

qqmyers and others added 3 commits December 5, 2023 14:30
Conflicts (easy, just "add both"):
doc/sphinx-guides/source/api/changelog.rst
doc/sphinx-guides/source/api/native-api.rst
src/main/java/edu/harvard/iq/dataverse/api/Admin.java
@pdurbin pdurbin merged commit e3e122a into IQSS:develop Dec 5, 2023
@pdurbin
Copy link
Member

pdurbin commented Dec 5, 2023

Found a regression, which Jim fixed (thank you!). I confirmed that I can still generate thumbnails for images and PDFs. For PDFs I had to install ImageMagick. See also this issue:

@qqmyers qqmyers deleted the IQSS/9506-thumbnail-tracking branch May 17, 2024 18:39
@cmbz cmbz added the FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) label Aug 16, 2025
@cmbz cmbz added FY26 Sprint 14 FY26 Sprint 14 (2025-12-31 - 2026-01-14) and removed FY26 Sprint 14 FY26 Sprint 14 (2025-12-31 - 2026-01-14) labels Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature: Performance & Stability FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) GDCC: Borealis of interest to Borealis GDCC: QDR of interest to QDR Size: 30 A percentage of a sprint. 21 hours. (formerly size:33)

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Add a "no thumbnail" flag to mark problematic images (to avoid extra generation attempts)

6 participants