Iqss/6489 - Direct uploads to S3 using presigned URLs#6490
Iqss/6489 - Direct uploads to S3 using presigned URLs#6490kcondon merged 164 commits intoIQSS:developfrom
Conversation
the JsonObjectBuilders were getting cleared in the build().toString() used to generate the log message, so the returned value was mostly empty...
Conflicts: pom.xml src/main/java/Bundle.properties src/main/java/edu/harvard/iq/dataverse/DataversePage.java src/main/webapp/dataverse.xhtml src/main/webapp/resources/images/favicondataverse.png
Conflicts: pom.xml
Conflicts: pom.xml
Conflicts: doc/sphinx-guides/source/admin/make-data-count.rst
Conflicts: src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java
|
I agree with @qqmyers. It looks like removing the binding seems to have no ill effects. Success and error messages are displaying as expected |
|
Argh - regarding file size - it looks like Amazon has a 5GB limit on the size of a single upload. Not sure if it is configurable. Minio S3 doesn't and I continue to be able to upload > 5 GB files with Minio. I thought I tried against AWS at some point but evidently not with a large file, presigned, and not using the aws cli). The right way to handle larger uploads to AWS is to break the file into parts that are sent in separate http requests. That looks to be possible (using presignedURLs) but not well documented/supported at the moment - probably not something to hold up 4.20 for. |
|
I think the change I committed helps with larger numbers of files (I've tested ~100 so far). I also have a handle on the progress bar flashing - the progress bar code still assumes one file/changes all the bars at once. I also noticed in testing that drag-n-drop doesn't work - must be a separate event to catch in the js to trigger processing. So far I haven't been able to reproduce #6 above. |
|
OK - may be worth testing again. I'm not yet catching any failed http calls but assuming the calls succeed, it should now handle more files, with progress showing for ~4 at a time, no flashing, drag and drop should work, and the long pause with the twirling icon at the end should be gone. Based on debugging with @kcondon , its possible that the root cause for #6 could also be fixed by the code I added a couple days ago to do a retry if Dataverse's first call to get the new file's metadata fails (e.g. some AWS server responds with a 404 before it learns about the new object). If anything fails without an http error code showing for some call in the browser dev console, let me know (including #6) |
|
@qqmyers Had mixed results when adding files to dataset on create with direct to s3: |
|
I just discovered a cut/paste error in the upload js this morning. That update is definitely needed. |
create local vars from global ones so that value isn't changed by another async callback before we use it.
What this PR does / why we need it: This PR supports direct uploads to S3 using presigned URLs
Which issue(s) this PR closes:
Closes #6489
Special notes for your reviewer: This PR builds on #6488. It could be modified to work independent of multiple stores, but that's not how it was developed and I suspect multiple stores will be merged before this. As of now, this branch is up to date with dev plus #6488.
The initial state of the PR bypasses all ingest processing for direct uploads. I'll be adding that back in. Up to you guys if you want to test prior to that. Also owe some documentation.
Suggestions on how to test this:
Once a store is configured to do direct uploads via a jvm option, the store should work as normal w.r.t. user interface uploads. Testing should probably focus on making sure that's true. Once the PR is updated to support ingest processes, testing could check that the processing is the same between a normal store and a direct upload one.
There will also be a version of the DVUploader that can use the direct upload capability when a store supports it (next PR). That could be used to test the API calls directly.
Performance would be nice to test as well - overall speed and max file size.
Does this PR introduce a user interface change?:
None beyond that in #6488. That said, some elements, like the upload progress bar are being controlled via Javascript rather than by PrimeFaces when direct upload is used, so there could be some unintentional/minor style differences (something else to test...).
Is there a release notes update needed for this change?:
Could be worth mentioning, but everything beyond #6488 is optional, so there are no required upgrade steps.
Additional documentation: