Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions doc/release-notes/8235-auxiliaryfileAPIenhancements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
### Auxiliary File API Enhancements

This release includes updates to the Auxiliary File API:
- Auxiliary files can now also be associated with non-tabular files
- Improved error reporting
- The API will block attempts to create a duplicate auxiliary file
- Delete and list-by-original calls have been added
- Bug fix: correct checksum recorded for aux file

Please note that the auxiliary files feature is experimental and is designed to support integration with tools from the [OpenDP Project](https://opendp.org). If the API endpoints are not needed they can be blocked.

### Major Use Cases

(note for release time - expand on the items above, as use cases)
3 changes: 3 additions & 0 deletions doc/release-notes/8241-ext.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
When you download an auxiliary file, the file extension will now be based on the extension of the file you uploaded, if it had one.

Auxiliary files uploaded previously do not have the filename saved and will have a file extension based on detected content type (MIME type), if any.
42 changes: 38 additions & 4 deletions doc/sphinx-guides/source/developers/aux-file-support.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Auxiliary File Support
======================

Auxiliary file support is experimental and as such, related APIs may be added, changed or removed without standard backward compatibility. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the OpenDP project (opendp.org). In future versions, this approach will likely become more broadly used and supported.
Auxiliary file support is experimental and as such, related APIs may be added, changed or removed without standard backward compatibility. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the `OpenDP project <https://opendp.org>`_. In future versions, this approach will likely become more broadly used and supported.

Adding an Auxiliary File to a Datafile
--------------------------------------
Expand All @@ -16,12 +16,12 @@ To add an auxiliary file, specify the primary key of the datafile (FILE_ID), and
export FORMAT_VERSION='v1'
export TYPE='DP'
export SERVER_URL=https://demo.dataverse.org

curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME" -F 'origin=myApp' -F 'isPublic=true' -F "type=$TYPE" "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"

You should expect a 200 ("OK") response and JSON with information about your newly uploaded auxiliary file.

Downloading an Auxiliary File that belongs to a Datafile
Downloading an Auxiliary File that Belongs to a Datafile
--------------------------------------------------------
To download an auxiliary file, use the primary key of the datafile, and the
formatTag and formatVersion (if applicable) associated with the auxiliary file:
Expand All @@ -33,5 +33,39 @@ formatTag and formatVersion (if applicable) associated with the auxiliary file:
export FILE_ID='12345'
export FORMAT_TAG='dpJson'
export FORMAT_VERSION='v1'

curl "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"

The file extension will be based on the file extension originally uploaded (but converted to lower case) or in the case of no file extension, a best guess will be made based on the content type (MIME type).

Listing Auxiliary Files for a Datafile by Origin
------------------------------------------------
To list auxiliary files, specify the primary key of the datafile (FILE_ID), and the origin associated with the auxiliary files to list (the application/entity that created them).

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export FILE_ID='12345'
export SERVER_URL=https://demo.dataverse.org
export ORIGIN='app1'

curl "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$ORIGIN"

You should expect a 200 ("OK") response and a JSON array with objects representing the auxiliary files found, or a 404/Not Found response if no auxiliary files exist with that origin.

Deleting an Auxiliary File that Belongs to a Datafile
-----------------------------------------------------
To delete an auxiliary file, use the primary key of the datafile, and the
formatTag and formatVersion (if applicable) associated with the auxiliary file:

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export FILE_ID='12345'
export FORMAT_TAG='dpJson'
export FORMAT_VERSION='v1'

curl -X DELETE "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"


20 changes: 19 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/AuxiliaryFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import edu.harvard.iq.dataverse.util.BundleUtil;
import java.io.Serializable;
import java.util.MissingResourceException;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
Expand All @@ -29,7 +30,10 @@
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesByType",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type = :type"),
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesWithoutType",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type is null"),})
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type is null"),
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesByOrigin",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.origin = :origin"),
})
@NamedNativeQueries({
@NamedNativeQuery(name = "AuxiliaryFile.findAuxiliaryFileTypes",
query = "select distinct type from auxiliaryfile where datafile_id = ?1")
Expand Down Expand Up @@ -63,6 +67,12 @@ public class AuxiliaryFile implements Serializable {

private String checksum;

/**
* filename can be null because it was never required originally.
*/
@Column(nullable = true)
private String filename;

/**
* A way of grouping similar auxiliary files together. The type could be
* "DP" for "Differentially Private Statistics", for example.
Expand Down Expand Up @@ -157,4 +167,12 @@ public String getTypeFriendly() {
}
}

public String getFilename() {
return filename;
}

public void setFilename(String filename) {
this.filename = filename;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,29 @@
import edu.harvard.iq.dataverse.dataaccess.StorageIO;
import edu.harvard.iq.dataverse.util.FileUtil;
import edu.harvard.iq.dataverse.util.SystemConfig;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.ArrayList;
import java.util.List;
import java.util.logging.Logger;
import javax.ejb.EJB;
import javax.ejb.Stateless;
import javax.inject.Named;
import javax.persistence.EntityManager;
import javax.persistence.NoResultException;
import javax.persistence.PersistenceContext;
import javax.persistence.Query;
import javax.persistence.TypedQuery;
import javax.ws.rs.ClientErrorException;
import javax.ws.rs.InternalServerErrorException;
import javax.ws.rs.ServerErrorException;
import javax.ws.rs.core.Response;

import org.apache.tika.Tika;

/**
Expand Down Expand Up @@ -59,11 +68,12 @@ public AuxiliaryFile save(AuxiliaryFile auxiliaryFile) {
* @param isPublic boolean - is this file available to any user?
* @param type how to group the files such as "DP" for "Differentially
* Private Statistics".
* @param filename name of the file
* @return success boolean - returns whether the save was successful
*/
public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile dataFile, String formatTag, String formatVersion, String origin, boolean isPublic, String type) {
StorageIO<DataFile> storageIO =null;
public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile dataFile, String formatTag, String formatVersion, String origin, boolean isPublic, String type, String filename) {

StorageIO<DataFile> storageIO = null;
AuxiliaryFile auxFile = new AuxiliaryFile();
String auxExtension = formatTag + "_" + formatVersion;
try {
Expand All @@ -73,12 +83,20 @@ public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile
// If the db fails for any reason, then rollback
// by removing the auxfile from storage.
storageIO = dataFile.getStorageIO();
MessageDigest md = MessageDigest.getInstance(systemConfig.getFileFixityChecksumAlgorithm().toString());
DigestInputStream di
= new DigestInputStream(fileInputStream, md);

storageIO.saveInputStreamAsAux(fileInputStream, auxExtension);
auxFile.setChecksum(FileUtil.checksumDigestToString(di.getMessageDigest().digest()) );
if (storageIO.isAuxObjectCached(auxExtension)) {
throw new ClientErrorException("Auxiliary file already exists", Response.Status.CONFLICT);
}
MessageDigest md;
try {
md = MessageDigest.getInstance(systemConfig.getFileFixityChecksumAlgorithm().toString());
} catch (NoSuchAlgorithmException e) {
logger.severe("NoSuchAlgorithmException for system fixity algorithm: " + systemConfig.getFileFixityChecksumAlgorithm().toString());
throw new InternalServerErrorException();
}
DigestInputStream di = new DigestInputStream(fileInputStream, md);

storageIO.saveInputStreamAsAux(di, auxExtension);
auxFile.setChecksum(FileUtil.checksumDigestToString(di.getMessageDigest().digest()));

Tika tika = new Tika();
auxFile.setContentType(tika.detect(storageIO.getAuxFileAsInputStream(auxExtension)));
Expand All @@ -87,20 +105,21 @@ public AuxiliaryFile processAuxiliaryFile(InputStream fileInputStream, DataFile
auxFile.setOrigin(origin);
auxFile.setIsPublic(isPublic);
auxFile.setType(type);
auxFile.setDataFile(dataFile);
auxFile.setDataFile(dataFile);
auxFile.setFileSize(storageIO.getAuxObjectSize(auxExtension));
auxFile.setFilename(filename);
auxFile = save(auxFile);
} catch (IOException ioex) {
logger.info("IO Exception trying to save auxiliary file: " + ioex.getMessage());
return null;
} catch (Exception e) {
logger.severe("IO Exception trying to save auxiliary file: " + ioex.getMessage());
throw new InternalServerErrorException();
} catch (ServerErrorException e) {
// If anything fails during database insert, remove file from storage
try {
storageIO.deleteAuxObject(auxExtension);
} catch(IOException ioex) {
logger.info("IO Exception trying remove auxiliary file in exception handler: " + ioex.getMessage());
return null;
} catch (IOException ioex) {
logger.warning("IO Exception trying remove auxiliary file in exception handler: " + ioex.getMessage());
}
throw e;
}
return auxFile;
}
Expand All @@ -115,13 +134,43 @@ public AuxiliaryFile lookupAuxiliaryFile(DataFile dataFile, String formatTag, St
try {
AuxiliaryFile retVal = (AuxiliaryFile)query.getSingleResult();
return retVal;
} catch(Exception ex) {
} catch(NoResultException nr) {
return null;
}
}


public List<AuxiliaryFile> findAuxiliaryFiles(DataFile dataFile, String origin) {

TypedQuery<AuxiliaryFile> query;
if (origin == null) {
query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFiles", AuxiliaryFile.class);
} else {
query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByOrigin", AuxiliaryFile.class);
query.setParameter("origin", origin);
}
query.setParameter("dataFileId", dataFile.getId());

List<AuxiliaryFile> retVal = query.getResultList();
return retVal;
}

public void deleteAuxiliaryFile(DataFile dataFile, String formatTag, String formatVersion) throws IOException {
AuxiliaryFile af = lookupAuxiliaryFile(dataFile, formatTag, formatVersion);
if (af == null) {
throw new FileNotFoundException();
}
em.remove(af);
StorageIO<?> storageIO;
storageIO = dataFile.getStorageIO();
String auxExtension = formatTag + "_" + formatVersion;
if (storageIO.isAuxObjectCached(auxExtension)) {
storageIO.deleteAuxObject(auxExtension);
}
}

public List<AuxiliaryFile> findAuxiliaryFiles(DataFile dataFile) {
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFiles", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFiles", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
return query.getResultList();
}
Expand Down Expand Up @@ -151,13 +200,13 @@ public List<String> findAuxiliaryFileTypes(DataFile dataFile, boolean inBundle)
}

public List<String> findAuxiliaryFileTypes(DataFile dataFile) {
Query query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFileTypes");
TypedQuery<String> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFileTypes", String.class);
query.setParameter(1, dataFile.getId());
return query.getResultList();
}

public List<AuxiliaryFile> findAuxiliaryFilesByType(DataFile dataFile, String typeString) {
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
query.setParameter("type", typeString);
return query.getResultList();
Expand All @@ -167,7 +216,7 @@ public List<AuxiliaryFile> findOtherAuxiliaryFiles(DataFile dataFile) {
List<AuxiliaryFile> otherAuxFiles = new ArrayList<>();
List<String> otherTypes = findAuxiliaryFileTypes(dataFile, false);
for (String typeString : otherTypes) {
TypedQuery query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesByType", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
query.setParameter("type", typeString);
List<AuxiliaryFile> auxFiles = query.getResultList();
Expand All @@ -178,7 +227,7 @@ public List<AuxiliaryFile> findOtherAuxiliaryFiles(DataFile dataFile) {
}

public List<AuxiliaryFile> findAuxiliaryFilesWithoutType(DataFile dataFile) {
Query query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesWithoutType", AuxiliaryFile.class);
TypedQuery<AuxiliaryFile> query = em.createNamedQuery("AuxiliaryFile.findAuxiliaryFilesWithoutType", AuxiliaryFile.class);
query.setParameter("dataFileId", dataFile.getId());
return query.getResultList();
}
Expand Down
Loading