Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
50a321f
Stash: refactoring getCompareVersionsSummary endpoint WIP
GPortas Oct 1, 2025
eeb6e33
Refactor: getCompareVersionsSummary and related layers
GPortas Oct 2, 2025
98c72cb
Changed: GetDatasetVersionSummariesCommand using DatasetVersionServic…
GPortas Oct 2, 2025
4685a51
Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…
GPortas Oct 2, 2025
5919cf4
Added: handling pagination optional params on getCompareVersionsSumma…
GPortas Oct 2, 2025
ea5642e
Added: DatasetVersionSummaryTest
GPortas Oct 2, 2025
b3c4fbd
Added: GetDatasetVersionSummariesCommandTest
GPortas Oct 2, 2025
89b88b5
Added: pagination test cases to testSummaryDatasetVersionsDifferences…
GPortas Oct 6, 2025
a19e7c9
Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…
GPortas Oct 6, 2025
93823a7
Added: pagination explanation docs to compareSummary datasets endpoint
GPortas Oct 6, 2025
393c7d4
Added: findFileMetadataHistory JPACriteria-based method to DataFileSe…
GPortas Oct 7, 2025
7ac4588
Added: GetFileVersionDifferencesCommand, pending to be refactored and…
GPortas Oct 7, 2025
bda189f
Changed: Files API versionDifferences endpoint now using GetFileVersi…
GPortas Oct 7, 2025
7973153
Added: VersionedFileMetadata class
GPortas Oct 7, 2025
a3b753b
Added: handling optional pagination in findFileMetadataHistory and Ge…
GPortas Oct 7, 2025
493aa38
Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…
GPortas Oct 8, 2025
55b78eb
Added: GetFileVersionDifferencesCommandTest
GPortas Oct 8, 2025
0bc9104
Changed: using JPACriteria-based method instead of javacode for filte…
GPortas Oct 9, 2025
a315e09
Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…
GPortas Oct 9, 2025
c7da46e
Stash: refactoring Files {id}/versionDifferences JSON printing logic
GPortas Oct 12, 2025
755eb80
Refactor: extracted FileVersionDifferenceJsonPrinter
GPortas Oct 12, 2025
4807c53
Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…
GPortas Oct 12, 2025
79094a7
Added: simple javadoc with TODO to FileVersionDifferenceJsonPrinter
GPortas Oct 12, 2025
0f71d6b
Added: pagination params to getFileVersionsList API endpoint
GPortas Oct 13, 2025
6c1da73
Added: test cases producing InvalidCommandArgumentsException to GetFi…
GPortas Oct 13, 2025
a3fe9c5
Added: pagination params validation to GetDatasetVersionSummariesCommand
GPortas Oct 13, 2025
6cecbf4
Refactor: AbstractPaginatedCommand
GPortas Oct 13, 2025
98a03c3
Added: invalid pagination params test cases to testSummaryDatasetVers…
GPortas Oct 13, 2025
3546279
Added: docs for pagination in versionDifferences Files API endpoint
GPortas Oct 13, 2025
ddfefd4
Fixed: typo in docs for versions/compareSummary
GPortas Oct 13, 2025
80eae81
Added: release notes for #11855
GPortas Oct 13, 2025
5217e68
Refactor: FileVersionDifferenceJsonPrinter with unit tests
GPortas Oct 13, 2025
0f54454
Fixed: typo in javadoc
GPortas Oct 13, 2025
0b21a9a
Fixed: DataFileServiceBean.findFileMetadataHistory behavior
GPortas Oct 14, 2025
b2f20ac
Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…
GPortas Oct 14, 2025
4557600
Fixed: added missing contributor names to file metadata in GetFileVer…
GPortas Oct 14, 2025
ef42d87
Added: explanatory comment to GetFileVersionDifferencesCommand
GPortas Oct 14, 2025
664c90b
Fixed: FileVersionDifferenceJsonPrinter to show dataset version relea…
GPortas Oct 16, 2025
272e50e
Fixed: GetFileVersionDifferencesCommand to display correct results
GPortas Oct 16, 2025
4f9edce
Added: getDatasetVersionCount JPA-Criteria based method to DatasetVer…
GPortas Oct 17, 2025
ea27c82
Added: GetDatasetVersionCountCommand
GPortas Oct 17, 2025
5cd3811
Added: totalCount to response in dataset version differences API
GPortas Oct 17, 2025
696e40a
Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…
GPortas Oct 17, 2025
c0e1008
Added: total count to file version differences API
GPortas Oct 17, 2025
9bb5fca
Added: release note tweaks
GPortas Oct 17, 2025
d264183
Added: extended IT cases for getFileVersionDifferences
GPortas Oct 17, 2025
175f822
Added: minor tweaks to native API docs about totalCount response fiel…
GPortas Oct 17, 2025
da66035
Merge branch 'develop' of github.com:IQSS/dataverse into 11855-versio…
GPortas Oct 22, 2025
b26a14d
Refactor: GetDatasetVersionSummariesCommand to include explanatory me…
GPortas Oct 22, 2025
71edb47
Fixed: contributor names retrieved in dataset version summaries
GPortas Oct 22, 2025
2515f78
Fixed: correctly sending fileDifferenceSummary.FileAccess in file ver…
GPortas Oct 22, 2025
b37941f
Added: IT test cases for FileTags and FileMetadata updates in file ve…
GPortas Oct 22, 2025
06f3507
Fixed: reverted method from public to private
GPortas Oct 23, 2025
28673b6
Merge branch 'develop' into 11855-version-summaries-pagination
ofahimIQSS Oct 23, 2025
6e696ee
Merge branch 'develop' into 11855-version-summaries-pagination
ofahimIQSS Oct 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
### Pagination for API Version Summaries

We've added pagination support to the following API endpoints:

- File Version Differences: api/files/{id}/versionDifferences

- Dataset Version Summaries: api/datasets/:persistentId/versions/compareSummary

You can now use two new query parameters to control the results:

- **limit**: An integer specifying the maximum number of results to return per page.

- **offset**: An integer specifying the number of results to skip before starting to return items. This is used to
navigate to different pages.

### Performance enhancements for API Version Summaries

In addition to adding pagination, we've significantly improved the performance of these endpoints by implementing more
efficient database queries.

These changes address performance bottlenecks that were previously encountered, especially with datasets or files
containing a large number of versions.

### Fixes for File Version Summaries API

The implementation for file version summaries was unreliable, leading to exceptions and functional inconsistencies, as
documented in issue #11561. This functionality has been reviewed and fixed to ensure correctness and stability.

### Related issues and PRs

- https://github.com/IQSS/dataverse/issues/11855
- https://github.com/IQSS/dataverse/pull/11859
- https://github.com/IQSS/dataverse/issues/11561
33 changes: 29 additions & 4 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2143,14 +2143,26 @@ be available to users who have permission to view unpublished drafts. The api to
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/BCCP9Z

curl -H "X-Dataverse-key: $API_TOKEN" -X PUT "$SERVER_URL/api/datasets/:persistentId/versions/compareSummary?persistentId=$PERSISTENT_IDENTIFIER"
curl -H "X-Dataverse-key: $API_TOKEN" -X GET "$SERVER_URL/api/datasets/:persistentId/versions/compareSummary?persistentId=$PERSISTENT_IDENTIFIER"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X PUT "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z"
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z"

You can control pagination of the results using the following optional query parameters.

* ``limit``: The maximum number of version differences to return.
* ``offset``: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.

To aid in pagination the JSON response also includes the total number of rows (totalCount) available.

For example, to get the second page of results, with 2 items per page, you would use ``limit=2`` and ``offset=2`` (skipping the first two results).

.. code-block:: bash

curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/datasets/:persistentId/versions/compareSummary?persistentId=doi:10.5072/FK2/BCCP9Z&limit=2&offset=2"

Update Metadata For a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -4322,8 +4334,21 @@ The fully expanded example above (without environment variables) looks like this

.. code-block:: bash

curl -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences"
curl -X GET "https://demo.dataverse.org/api/files/:persistentId/versionDifferences?persistentId=doi:10.5072/FK2/J8SJZB"
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences"
curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/versionDifferences?persistentId=doi:10.5072/FK2/J8SJZB"

You can control pagination of the results using the following optional query parameters.

* ``limit``: The maximum number of version differences to return.
* ``offset``: The number of version differences to skip from the beginning of the list. Used for retrieving subsequent pages of results.

To aid in pagination the JSON response also includes the total number of rows (totalCount) available.

For example, to get the second page of results, with 2 items per page, you would use ``limit=2`` and ``offset=2`` (skipping the first two results).

.. code-block:: bash

curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/1234/versionDifferences?limit=2&offset=2"

Adding Files
~~~~~~~~~~~~
Expand Down
137 changes: 132 additions & 5 deletions src/main/java/edu/harvard/iq/dataverse/DataFileServiceBean.java
Original file line number Diff line number Diff line change
Expand Up @@ -28,18 +28,18 @@
import java.util.Map;
import java.util.Set;
import java.util.UUID;
import java.util.function.Function;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.stream.Collectors;

import jakarta.ejb.EJB;
import jakarta.ejb.Stateless;
import jakarta.ejb.TransactionAttribute;
import jakarta.ejb.TransactionAttributeType;
import jakarta.inject.Named;
import jakarta.persistence.EntityManager;
import jakarta.persistence.NoResultException;
import jakarta.persistence.PersistenceContext;
import jakarta.persistence.Query;
import jakarta.persistence.TypedQuery;
import jakarta.persistence.*;
import jakarta.persistence.criteria.*;

/**
*
Expand Down Expand Up @@ -376,6 +376,133 @@ public FileMetadata findFileMetadataByDatasetVersionIdAndDataFileId(Long dataset
}
}

/**
* Finds the complete history of a file's presence across all dataset versions.
* <p>
* This method returns a {@link VersionedFileMetadata} entry for every version
* of the specified dataset. If a version does not contain the file, the
* {@code fileMetadata} field in the corresponding DTO will be {@code null}.
* It correctly handles file replacements by searching for all files sharing the
* same {@code rootDataFileId}.
*
* @param datasetId The ID of the parent dataset.
* @param dataFile The DataFile entity to find the history for.
* @param canViewUnpublishedVersions A boolean indicating if the user has permission to view non-released versions.
* @param limit (Optional) The maximum number of results to return.
* @param offset (Optional) The starting point of the result list.
* @return A chronologically sorted, paginated list of the file's version history, including versions where the file is absent.
*/
public List<VersionedFileMetadata> findFileMetadataHistory(Long datasetId,
DataFile dataFile,
boolean canViewUnpublishedVersions,
Integer limit,
Integer offset) {
if (dataFile == null) {
return Collections.emptyList();
}

// Query 1: Get the paginated list of relevant DatasetVersions
CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<DatasetVersion> versionQuery = cb.createQuery(DatasetVersion.class);
Root<DatasetVersion> versionRoot = versionQuery.from(DatasetVersion.class);

List<Predicate> versionPredicates = new ArrayList<>();
versionPredicates.add(cb.equal(versionRoot.join("dataset").get("id"), datasetId));
if (!canViewUnpublishedVersions) {
versionPredicates.add(versionRoot.get("versionState").in(
VersionState.RELEASED, VersionState.DEACCESSIONED));
}
versionQuery.where(versionPredicates.toArray(new Predicate[0]));
versionQuery.orderBy(
cb.desc(versionRoot.get("versionNumber")),
cb.desc(versionRoot.get("minorVersionNumber"))
);

TypedQuery<DatasetVersion> typedVersionQuery = em.createQuery(versionQuery);
if (limit != null) {
typedVersionQuery.setMaxResults(limit);
}
if (offset != null) {
typedVersionQuery.setFirstResult(offset);
}
List<DatasetVersion> datasetVersions = typedVersionQuery.getResultList();

if (datasetVersions.isEmpty()) {
return Collections.emptyList();
}

// Query 2: Get all FileMetadata for this file's history in this dataset
CriteriaQuery<FileMetadata> fmQuery = cb.createQuery(FileMetadata.class);
Root<FileMetadata> fmRoot = fmQuery.from(FileMetadata.class);

List<Predicate> fmPredicates = new ArrayList<>();
fmPredicates.add(cb.equal(fmRoot.get("datasetVersion").get("dataset").get("id"), datasetId));

// Find the file by its entire lineage
if (dataFile.getRootDataFileId() < 0) {
fmPredicates.add(cb.equal(fmRoot.get("dataFile").get("id"), dataFile.getId()));
} else {
fmPredicates.add(cb.equal(fmRoot.get("dataFile").get("rootDataFileId"), dataFile.getRootDataFileId()));
}
fmQuery.where(fmPredicates.toArray(new Predicate[0]));

List<FileMetadata> fileHistory = em.createQuery(fmQuery).getResultList();

// Combine results
Map<Long, FileMetadata> fmMap = fileHistory.stream()
.collect(Collectors.toMap(
fm -> fm.getDatasetVersion().getId(),
Function.identity()
));

// Create the final list, looking up the FileMetadata for each version
return datasetVersions.stream()
.map(version -> new VersionedFileMetadata(
version,
fmMap.get(version.getId()) // This will be null if no entry exists for that version ID
))
.collect(Collectors.toList());
}

/**
* Finds the FileMetadata for a given file in the version immediately preceding a specified version.
*
* @param fileMetadata The FileMetadata instance from the current version, used to identify the file's lineage.
* @return The FileMetadata from the immediately prior version, or {@code null} if this is the first version of the file.
*/
public FileMetadata getPreviousFileMetadata(FileMetadata fileMetadata) {
if (fileMetadata == null || fileMetadata.getDataFile() == null) {
return null;
}

// 1. Get the ID of the file that was replaced.
Long previousId = fileMetadata.getDataFile().getPreviousDataFileId();

// If there's no previous ID, this is the first version of the file.
if (previousId == null) {
return null;
}

CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<FileMetadata> cq = cb.createQuery(FileMetadata.class);
Root<FileMetadata> fileMetadataRoot = cq.from(FileMetadata.class);

// 2. Join FileMetadata to DataFile to access the ID.
Join<FileMetadata, DataFile> dataFileJoin = fileMetadataRoot.join("dataFile");

// 3. Find the FileMetadata whose DataFile ID matches the previousId.
cq.where(cb.equal(dataFileJoin.get("id"), previousId));

// --- Execution ---
TypedQuery<FileMetadata> query = em.createQuery(cq);
try {
return query.getSingleResult();
} catch (NoResultException e) {
// If no result is found, return null.
return null;
}
}

public FileMetadata findMostRecentVersionFileIsIn(DataFile file) {
if (file == null) {
return null;
Expand Down
6 changes: 5 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,11 @@
@NamedQuery(name = "DatasetVersion.findById",
query = "SELECT o FROM DatasetVersion o LEFT JOIN FETCH o.fileMetadatas WHERE o.id=:id"),
@NamedQuery(name = "DatasetVersion.findByDataset",
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
@NamedQuery(name = "DatasetVersion.findByDesiredStatesAndDataset",
query = "SELECT o FROM DatasetVersion o " +
"WHERE o.dataset.id = :datasetId AND o.versionState IN :states " +
"ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
@NamedQuery(name = "DatasetVersion.findReleasedByDataset",
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId AND o.versionState=edu.harvard.iq.dataverse.DatasetVersion.VersionState.RELEASED ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC")/*,
@NamedQuery(name = "DatasetVersion.findVersionElements",
Expand Down
Loading