Conversation
Support is not full. As noted in the docs, only a single range is supported and the "If-Range" header is not supported.
This was null for aux files, for example.
src/main/java/edu/harvard/iq/dataverse/api/DownloadInstanceWriter.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/api/DownloadInstanceWriter.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/harvard/iq/dataverse/api/DownloadInstanceWriter.java
Outdated
Show resolved
Hide resolved
|
One other thing I'm debating/wondering is if we should throw a "not supported" exception not just on requests like subsetting, that are generated dynamically - but on ANY byte range requests except for the main physical file, or the saved original. |
src/main/java/edu/harvard/iq/dataverse/api/DownloadInstanceWriter.java
Outdated
Show resolved
Hide resolved
|
Let me know if you have any questions/need any assistance with this PR. |
Also start writing real integration tests. A FIXME now notes that ranges don't currently work well with tabular files, due to how we store the header.
|
Thanks @landreev for discussing this today. I went ahead and pushed what I have so that you can look at the backend. I'll keep working on the tests and will think more about the problem we discovered with ranges for tabular files. (In short, the header is always written, even if you request the last bytes of a file.) To summarize the decisions made:
Note to self that I plan to add a release note, work more on the tests, and see what I can do about tabular files. |
If a range is not requested, write the whole variable header line. Otherwise, make a reasonable effort to write what we can.
|
|
||
| try { | ||
| in = new FileInputStream(getFileSystemPath().toFile()); | ||
| in.skip(this.getOffset()); |
There was a problem hiding this comment.
I believe this is what we want to rearrange: this in.skip() should not be happening here, in the open method, but in the setOffset() method itself. Because we want to be able to change that offset after the initial open.
The setOffset() method will need to throw an IOException, if it's called while the InputStream is still null; or if the skip() call itself results in an IOException()
There was a problem hiding this comment.
Yes, excellent idea. Implemented in 4724e11 and now offset seems to be working again, at least for non-tabular files. Thanks!
Offset stopped working in 29ea256. Before that commit we were relying on setOffset() being called after open(). Now we can (and do) call setOffset() after open().
| Response downloadFileNoArgs = UtilIT.downloadFile(fileIdCsv, null, null, null, authorApiToken); | ||
| downloadFileNoArgs.then().assertThat() | ||
| .statusCode(OK.getStatusCode()) | ||
| .body(equalTo("name pounds species\n" |
There was a problem hiding this comment.
[reviewdog] reported by reviewdog 🐶
File contains tab characters (this is the first instance).
There was a problem hiding this comment.
Uh oh. Note to self: Ask @poikilotherm how to tell reviewdog that this is just a test and I actually want the tabs. Worst case I guess I could replace them with \t.
There was a problem hiding this comment.
I couldn't find a fix in the reviewdog docs so in a57e629 I replaced the tabs with \t. It would be good to figure this out someday though.
Just above `ranges` is initialized to `new ArrayList<>()`.
landreev
left a comment
There was a problem hiding this comment.
I believe those are the only comments I had, otherwise I'm happy with it.
But, since I did contribute a little bit to it in the end, maybe somebody else could also take a look.
Most importantly, this commit introduces using status code 416 (Range Not Satisfiable) when the range is invalid. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Range says, "If the ranges are invalid, the server returns the 416 Range Not Satisfiable error." I also rearranged the logic around testing for a single range. Now it fails fast and reports that only one range is supported rather than potentially showing an error about the range being beyond the file size, for example. I also changed the error message "Start is larger than end" to "Start is larger than end or size of file" to more accurately reflect what the check is doing. Finally, I did some cleanup and added additional tests.
|
After 3024d67 I'm ready for more review. Here's the detailed comment I left in that commit |
| public Range(long start, long end) { | ||
| this.start = start; | ||
| this.end = end; | ||
| this.length = end - start + 1; |
There was a problem hiding this comment.
is there any reason to store the length here and not just calcualte in in the get? (future proofing for if we some day add setters for start and end)
scolapasta
left a comment
There was a problem hiding this comment.
Just added a couple of minor thoughts - neither is critical, but would make (imo) for cleaner code.
| Response downloadTxtNoArgs = UtilIT.downloadFile(fileIdTxt, null, null, null, authorApiToken); | ||
| downloadTxtNoArgs.then().assertThat() | ||
| .statusCode(OK.getStatusCode()) | ||
| .body(equalTo("first is the worst\n" |
There was a problem hiding this comment.
could we simplify the body(equals to, by using String functions? i.e instead of typing the specific text again and again, just do the proper substring on contentOfTxt? Feels like it would make the tests. more readable.
There was a problem hiding this comment.
I had a play with this and I like my way better. I'll show some examples below but for the ranges in particular, it's weird and confusing to have to add one to the "end" parameter in substring. You're asking for a range of 0-9 but you have to do substring(0, 10) to get the test to pass.
My idea with these tests is that you establish a small bit of text up front and then visually reason about "did I get the first 10 characters or not?"
From chatting with @scolapasta it sounded like he didn't feel super strongly about these tests anyway. Just a suggestion.
// Download the whole file.
Response downloadTxtNoArgs = UtilIT.downloadFile(fileIdTxt, null, null, null, authorApiToken);
downloadTxtNoArgs.then().assertThat()
.statusCode(OK.getStatusCode())
.body(equalTo(contentOfTxt.substring(0, contentOfTxt.length())));
// .body(equalTo("first is the worst\n"
// + "second is the best\n"
// + "third is the one with the hairy chest\n"));
// Download the first 10 bytes.
Response downloadTxtFirst10 = UtilIT.downloadFile(fileIdTxt, "0-9", null, null, authorApiToken);
downloadTxtFirst10.then().assertThat()
.statusCode(OK.getStatusCode())
.body(equalTo(contentOfTxt.substring(0, 10)));
// .body(equalTo("first is t"));
// Download the last 6 bytes.
Response downloadTxtLast6 = UtilIT.downloadFile(fileIdTxt, "-6", null, null, authorApiToken);
downloadTxtLast6.then().assertThat()
.statusCode(OK.getStatusCode())
.body(equalTo(contentOfTxt.substring(contentOfTxt.length() - 6, contentOfTxt.length())));
// .body(equalTo("chest\n"));
// Download some bytes from the middle.
Response downloadTxtMiddle = UtilIT.downloadFile(fileIdTxt, "09-19", null, null, authorApiToken);
downloadTxtMiddle.then().assertThat()
.statusCode(OK.getStatusCode())
.body(equalTo(contentOfTxt.substring(9, 20)));
// .body(equalTo("the worst\ns"));
// Skip the first 10 bytes and download the rest.
Response downloadTxtSkipFirst10 = UtilIT.downloadFile(fileIdTxt, "9-", null, null, authorApiToken);
downloadTxtSkipFirst10.then().assertThat()
.statusCode(OK.getStatusCode())
.body(equalTo(contentOfTxt.substring(9, contentOfTxt.length())));
// .body(equalTo("the worst\n"
// + "second is the best\n"
// + "third is the one with the hairy chest\n"));
|
Ok, based on code review, I fixed up the code in 22d5ffb and played around with the tests. I also merged the latest from develop. I'm ready for more review. |
|
On Friday I merged the latest from develop into this branch to pick up the 5.8 pom.xml change but the API tests failed because certain things hadn't happened yet:
This morning I made a small change to the release note to force the API tests to run and they passed. |
What this PR does / why we need it:
We'd like users to be able to download parts of files: the beginning (like
head), the end (liketail), and a range in the middle.Which issue(s) this PR closes:
Closes #6937
Special notes for your reviewer:
Support is not full. As noted in the docs, only a single range is supported and the "If-Range" header is not supported.
Suggestions on how to test this:
Try the examples in the docs.
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No.
Is there a release notes update needed for this change?:
Yes.
Additional documentation:
Included.