Skip to content

Comments

Iqss/6489 - Direct uploads to S3 using presigned URLs#6490

Merged
kcondon merged 164 commits intoIQSS:developfrom
TexasDigitalLibrary:IQSS/6489
Mar 23, 2020
Merged

Iqss/6489 - Direct uploads to S3 using presigned URLs#6490
kcondon merged 164 commits intoIQSS:developfrom
TexasDigitalLibrary:IQSS/6489

Conversation

@qqmyers
Copy link
Member

@qqmyers qqmyers commented Jan 7, 2020

What this PR does / why we need it: This PR supports direct uploads to S3 using presigned URLs

Which issue(s) this PR closes:

Closes #6489

Special notes for your reviewer: This PR builds on #6488. It could be modified to work independent of multiple stores, but that's not how it was developed and I suspect multiple stores will be merged before this. As of now, this branch is up to date with dev plus #6488.

The initial state of the PR bypasses all ingest processing for direct uploads. I'll be adding that back in. Up to you guys if you want to test prior to that. Also owe some documentation.

Suggestions on how to test this:
Once a store is configured to do direct uploads via a jvm option, the store should work as normal w.r.t. user interface uploads. Testing should probably focus on making sure that's true. Once the PR is updated to support ingest processes, testing could check that the processing is the same between a normal store and a direct upload one.
There will also be a version of the DVUploader that can use the direct upload capability when a store supports it (next PR). That could be used to test the API calls directly.
Performance would be nice to test as well - overall speed and max file size.

Does this PR introduce a user interface change?:
None beyond that in #6488. That said, some elements, like the upload progress bar are being controlled via Javascript rather than by PrimeFaces when direct upload is used, so there could be some unintentional/minor style differences (something else to test...).

Is there a release notes update needed for this change?:
Could be worth mentioning, but everything beyond #6488 is optional, so there are no required upgrade steps.

Additional documentation:

qqmyers and others added 30 commits October 16, 2018 09:21
the JsonObjectBuilders were getting cleared in the build().toString()
used to generate the log message, so the returned value was mostly
empty...
use javascript to scroll within panel after update
Conflicts:
	pom.xml
	src/main/java/Bundle.properties
	src/main/java/edu/harvard/iq/dataverse/DataversePage.java
	src/main/webapp/dataverse.xhtml
	src/main/webapp/resources/images/favicondataverse.png
Conflicts:
	doc/sphinx-guides/source/admin/make-data-count.rst
Conflicts:
	src/main/java/edu/harvard/iq/dataverse/settings/SettingsServiceBean.java
@sekmiller
Copy link
Contributor

I agree with @qqmyers. It looks like removing the binding seems to have no ill effects. Success and error messages are displaying as expected

@qqmyers
Copy link
Member Author

qqmyers commented Mar 13, 2020

Argh - regarding file size - it looks like Amazon has a 5GB limit on the size of a single upload. Not sure if it is configurable. Minio S3 doesn't and I continue to be able to upload > 5 GB files with Minio. I thought I tried against AWS at some point but evidently not with a large file, presigned, and not using the aws cli).

The right way to handle larger uploads to AWS is to break the file into parts that are sent in separate http requests. That looks to be possible (using presignedURLs) but not well documented/supported at the moment - probably not something to hold up 4.20 for.
I'm still looking into why large numbers of files fails. If there's an obvious fix for that I can add that, and I may be able to add some error checking to catch the errors from CORS/AWS size limit.
With those fixes, which I hope can be quick, maybe this can go as is as an 'experimental' or 'limited' feature for now and larger files at AWS can be a separate PR?

@kcondon
Copy link
Contributor

kcondon commented Mar 14, 2020

@qqmyers Thanks for the info and continuing to look into it. I can test using some lesser number of files such as 50 to see whether it behaves reasonably well. That might be enough for a first effort but would need to check in with @djbrooke

@kcondon kcondon assigned qqmyers and djbrooke and unassigned kcondon Mar 16, 2020
@mheppler mheppler self-assigned this Mar 17, 2020
@qqmyers
Copy link
Member Author

qqmyers commented Mar 19, 2020

I think the change I committed helps with larger numbers of files (I've tested ~100 so far).

I also have a handle on the progress bar flashing - the progress bar code still assumes one file/changes all the bars at once.

I also noticed in testing that drag-n-drop doesn't work - must be a separate event to catch in the js to trigger processing.

So far I haven't been able to reproduce #6 above.

@qqmyers
Copy link
Member Author

qqmyers commented Mar 19, 2020

OK - may be worth testing again. I'm not yet catching any failed http calls but assuming the calls succeed, it should now handle more files, with progress showing for ~4 at a time, no flashing, drag and drop should work, and the long pause with the twirling icon at the end should be gone. Based on debugging with @kcondon , its possible that the root cause for #6 could also be fixed by the code I added a couple days ago to do a retry if Dataverse's first call to get the new file's metadata fails (e.g. some AWS server responds with a 404 before it learns about the new object).

If anything fails without an http error code showing for some call in the browser dev console, let me know (including #6)

@kcondon
Copy link
Contributor

kcondon commented Mar 20, 2020

@qqmyers Had mixed results when adding files to dataset on create with direct to s3:
first time failed to add files, ui froze at progress bar, no progress, error in logs
second time failed to add 3 of 4 files, exception in logs, multiple retries and failure to add file msgs
third time succeeded as expected.

@qqmyers
Copy link
Member Author

qqmyers commented Mar 20, 2020

I just discovered a cut/paste error in the upload js this morning. That update is definitely needed.

@kcondon kcondon assigned kcondon and unassigned mheppler, qqmyers and djbrooke Mar 23, 2020
@kcondon kcondon merged commit 0da6f13 into IQSS:develop Mar 23, 2020
@djbrooke djbrooke added this to the 4.20 milestone Mar 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support larger file uploads to S3 stores via direct upload using presigned URLs

8 participants