Conversation
| } | ||
| } | ||
|
|
||
| // TODO: Return ".md" for "text/markdown" as well as other extensions in MimeTypeDetectionByFileExtension.properties |
There was a problem hiding this comment.
FWIW: We have FileUtil.generateOriginalExtension for tabular files but it uses a hardcoded list rather than the properties file.
| export SERVER_URL=https://demo.dataverse.org | ||
|
|
||
| curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME" -F 'origin=myApp' -F 'isPublic=true' -F "type=$TYPE" "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION" | ||
| curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME;type=$FILETYPE" -F 'origin=myApp' -F 'isPublic=true' -F "type=$TYPE" "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION" |
There was a problem hiding this comment.
I'm very mildly rubbed the wrong way by the fact that there are now 2 instances of type= in the command line above... Would have been better if we had named that type parameter something more specific (auxtype? dvtype? - idk, something indicating that this is our, proprietary type...) to differentiate it from the standard mime type... But I'm not going to propose changing it now.
This API is for software clients only; nobody should ever be using it on the command line except an occasional developer.
There was a problem hiding this comment.
I had the same thought: two cases of "type" is weird and unfortunate. Perhaps we'll do a final cleanup of this API before we move its docs from the dev guide to the API guide and right some of these wrongs.
There was a problem hiding this comment.
Whether this specific parameter is worth changing or not, I really like the fact that we have a note in the doc saying that this API is experimental, so we reserve the right to keep changing it without guaranteeing backward compatibility, etc.
|
|
||
| Tika tika = new Tika(); | ||
| auxFile.setContentType(tika.detect(storageIO.getAuxFileAsInputStream(auxExtension))); | ||
| if (mediaType != null) { |
There was a problem hiding this comment.
Quick question - is mediaType (formDataBodyPart.getMediaType();) definitely going to be null here, if not supplied by the client? - As opposed to defaulting to something like application/octet-stream?
There was a problem hiding this comment.
Hmm. You're right. When I leave off ;type=WHATEVER (the old way), mediaType is non-null and its toString method prints application/octet-stream.
There was a problem hiding this comment.
So it should just be if (mediaType != null && (!mediaType.equals("application/octet-stream"))) here. We are doing something similar throughout the "normal" datafile upload process.
Otherwise everything looks good in the PR.
There was a problem hiding this comment.
I didn't make any code changes but I tweaked the tests so I could make assertions on what happens if you leave off ;type=WHATEVER: 787dbed
I'm happy to make changes, of course. Right now we're saying the user can change the content type to whatever we want. We could change is such that if they supply application/octet-stream as a content type we'll run it though Tika and save whatever Tika says it is. That is, we could do something like this:
if (mediaType != null && !mediaType.toString().equals("application/octet-stream")) {
auxFile.setContentType(mediaType.toString());
} else {
Tika tika = new Tika();
auxFile.setContentType(tika.detect(storageIO.getAuxFileAsInputStream(auxExtension)));
}
There was a problem hiding this comment.
Yes, that's what I was saying above, how that "if" line should look like. Nobody would ever want to supply "application/octet-stream" as the type. That type just means "type unknown" or "type not specified".
There was a problem hiding this comment.
(otherwise Tika will never be called... and we want to leave this option, of not supplying the type and letting the application identify it)
There was a problem hiding this comment.
Yes, sorry, I didn't see that comment until I had written mine. I just pushed the change in 985b552. Please take a look.
There was a problem hiding this comment.
Yeah, I think it looks good now - but I'd like @scolapasta and @qqmyers to chime in, on whether we really want to be doing this instead, and not in addition to saving the file name, as uploaded.
"application/octet-stream" is the default when the user doesn't supply a content type. So if it's this, send it through Tika. Yes, a user can supply "application/octet-stream" and this will also be sent through Tika.
| Tika tika = new Tika(); | ||
| auxFile.setContentType(tika.detect(storageIO.getAuxFileAsInputStream(auxExtension))); | ||
| // The null check prevents an NPE but we expect mediaType to be non-null | ||
| // and to default to "application/octet-stream". |
There was a problem hiding this comment.
FWIW, I believe the null check is necessary, since the mediaType can still be null, if an input stream is used for the file body.
landreev
left a comment
There was a problem hiding this comment.
OK, the code looks good, and everybody appears to be in agreement, that this is how we want to address both of the underlying issues, the mime type and the file name.
If somehow some picky user requests being able to have non-standard extensions, and/or custom download filenames for these, we'll just addres it separately.
|
@pdurbin Would you update this pr from develop? I'm getting a flyway mismatch error: |
What this PR does / why we need it:
Some auxiliary were being saved with the wrong content type (MIME type) but now the user can supply the content type on upload, overriding the type that would otherwise be assigned.
Which issue(s) this PR closes:
Closes #8241
Special notes for your reviewer:
Markdown files can now be given a better content type of
text/markdownbut on download, no file extension is added. This is because Tika's getDefaultMimeTypes doesn't include "text/markdown". I left a TODO about how perhaps we could look up more esoteric content types in the MimeTypeDetectionByFileExtension.properties file we maintain (.geojson was added recently, for example).Suggestions on how to test this:
Run the curl commands. You should be able to set the content type to whatever you want.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No.
Is there a release notes update needed for this change?:
Included but feel free to leave it out.
Additional documentation:
Included.