Version summaries fixes, enhancements and pagination#11859
Version summaries fixes, enhancements and pagination#11859ofahimIQSS merged 55 commits intodevelopfrom
Conversation
This comment has been minimized.
This comment has been minimized.
…eBean.findVersions
…n-summaries-pagination
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
943da66 to
b3c4fbd
Compare
This comment has been minimized.
This comment has been minimized.
3 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
…onDifferencesCommand
…tFileVersionDifferencesCommand refactored
This comment has been minimized.
This comment has been minimized.
1 similar comment
This comment has been minimized.
This comment has been minimized.
…n-summaries-pagination
|
Everything should be working now as expected, except for the pre-existing issue mentioned in #11921. Thanks, @ChengShi-1! |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
|
||
| //TODO: this could use some refactoring to cut down on the number of for loops! | ||
| private FileMetadata getPreviousFileMetadata(FileMetadata fileMetadata, DatasetVersion currentversion) { | ||
| public FileMetadata getPreviousFileMetadata(FileMetadata fileMetadata, DatasetVersion currentversion) { |
There was a problem hiding this comment.
Not sure why this was made public. the only usage is inside the helper
There was a problem hiding this comment.
Thanks — I forgot to make it private again. I had to set it to public during the multi-step refactor while extracting logic and responsibilities from this class.
FWIW - The API no longer uses this method; instead, it uses a more efficient JPA Criteria–based DataFileServiceBean method.
There was a problem hiding this comment.
I just made it private again
This comment has been minimized.
This comment has been minimized.
I tested dataset summaries and file summaries again, and they work well now. Thanks for resolving them! |
sekmiller
left a comment
There was a problem hiding this comment.
Very good stuff here. Thanks.
This comment has been minimized.
This comment has been minimized.
|
looks good - going to perform regression in QA. Merging after CI completes |
|
📦 Pushed preview images as 🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name. |
|
everythings looking good here- merging |
What this PR does / why we need it:
This pull request introduces pagination mechanisms for the dataset and datafile version summary/difference endpoints, along with a substantial reimplementation of the existing functionality.
While the initial scope was limited to adding pagination, a preliminary investigation revealed significant underlying issues that would have made a direct implementation inefficient and unsustainable. The core problems identified were:
High Coupling and Poor Separation of Concerns: The existing code lacked clear architectural layering and encapsulation for these use cases, leading to tightly coupled components that are difficult to maintain, test, and extend.
Severe Performance Bottlenecks: The implementation relied on fetching bulk data from the database and then post-processing it in Java using multiple nested loops. This approach caused significant performance degradation, especially for datasets or files with a large number of versions.
Low Test Coverage: The lack of a comprehensive test suite made it risky to extend/alter the existing functionality without introducing regressions.
Inconsistent Behavior and Errors: The code for file version summaries didn't work in some scenarios, generating errors like the one mentioned at API: File versionDifferences has 500 error after uploading files to published dataset #11561 or not functioning as it should.
Changes Made
Given the issues discovered, the decision was made to perform a comprehensive, end-to-end refactoring and reimplementation of these features. This ensures that the new pagination functionality is built upon a robust, performant, and maintainable foundation. I had to revisit each case of the file version summaries logic to fix incorrect behaviors based on the original JSF behavior.
The key changes include:
Architectural Realignment: The entire workflow, from the API endpoint to the data access layer, has been refactored to align with the established Dataverse architecture using Commands and Services. This improves modularity and clarifies responsibilities within the code.
Performance Optimization with JPA Criteria: All data processing has been pushed down to the database layer. In-memory processing with loops has been replaced with specific, performant JPA Criteria-based queries. This dramatically improves response times for entities with extensive version histories.
Improved Test Coverage: New unit tests have been introduced to cover the refactored code, addressing critical logic that was previously untested. This ensures the stability of the new implementation and simplifies future development.
The 500 errors that were occurring and reported in #11561 have been resolved. These errors were caused by a null pointer exception resulting from null datafiles. Below, the error trace is shown, followed by a screenshot of the error being reproduced, and then another screenshot taken after deploying this PR branch.
Reproduced error:


Fixed error:

As you can see, the response time is 543 ms, which shows a significant improvement in load time compared to JSF. In addition, pagination hasn’t been implemented in the SPA yet, which would make it even faster — especially for data files with many more versions than this example.
In the UI, we still need to apply a few tweaks — specifically, to display the message “File not included in this version” instead of “No changes associated with this version” for versions where the file hadn’t yet been added. This change has already been discussed with the frontend team for implementation. cc: @ChengShi-1
Which issue(s) this PR closes:
Sharing some thoughts...:
Let's keep our focus on making sure our codebase stays healthy and easy to work with.
Sticking to our layered architecture is key. It's what makes it way easier for everyone to jump in and make changes without breaking things.
Let's also be serious about our tests. Good unit tests prove the little pieces work, and API tests make sure the whole thing hangs together. Both are necessary. Test cases should be complete enough, especially for features like these summaries with many different scenarios.
Finally, let's all try to follow the 'campsite rule' with tech debt: leave the code a little cleaner than you found it. If you spot something you can quickly improve, do it. It's a small effort that saves us from huge headaches down the road.
Suggestions on how to test this:
Performance enhancements could be tested by running this branch on an installation with a dataset or file with a large number of versions, and calling the endpoint below without pagination, and compare the response time with develop.
You can control pagination of the results using the following optional query parameters.
limit: The maximum number of version differences to return.offset: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.For example, to get the second page of results, with 2 items per page, you would use
limit=2andoffset=2(skipping the first two results).For datasets:
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z&limit=2&offset=2"For files:
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences?limit=2&offset=2"Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No
Is there a release notes update needed for this change?:
Yes, attached.