S3 package zip by matthew-a-dunlap · Pull Request #35 · sbgrid/data-capture-module

matthew-a-dunlap · 2018-11-30T17:58:50Z

No description provided.

pameyer · 2018-11-30T20:29:02Z

rpm/dep/c6build.dockerfile

 FROM centos:6
 # starting centos 6 build image for DCM
-RUN yum install -y rpm-build python-setuptools wget rpmdevtools
+RUN yum install -y rpm-build python-setuptools wget rpmdevtools zip


This should be moved to the RPM spec

pameyer · 2018-11-30T20:29:25Z

rpm/dep/c7build.dockerfile

 FROM centos:7
 # starting centos 7 build image for DCM
-RUN yum install -y rpm-build python-setuptools wget
+RUN yum install -y rpm-build python-setuptools wget zip


also should go to rpm spec

pameyer · 2018-11-30T20:30:45Z

scn/post_upload_s3.bash

 		if [ ! `aws s3 ls s3://${S3HOLD}/${ulidFromJson}/`]; then    #this check is different than normal post_upload, we don't use the extra folder level
-			aws s3 cp --recursive ${DEPOSIT}/${ulidFolder}/${ulidFolder}/ s3://${S3HOLD}/${ulidFromJson}/ #this does not copy empty folders from DEPOSIT as folders do not actually exist in s3
+			packageName="package_$ulidFolder"
+			packageExt="zip"


packageExt should probably get moved out of the loop / conditional.

pameyer · 2018-11-30T20:34:21Z

scn/post_upload_s3.bash

+
+			#It would be awesome to someday zip everything while it is being streamed.
+			echo "beginning zip of ${DEPOSIT}/${ulidFolder}/${ulidFolder}/"
+			zip -r $packageName ${ulidFolder}/ #There are two layers of ${ulidFolder}


This should be probably check the return value, and fail cleanly if zip creation has failed.

pameyer · 2018-11-30T20:58:13Z

scn/post_upload_s3.bash

+
+			echo "test: ${DEPOSIT}/${ulidFolder}/$packageName"
+			aws s3 cp ${packageName}.${packageExt} s3://${S3HOLD}/${ulidFromJson}/
+			aws s3 cp ${packageName}.sha s3://${S3HOLD}/${ulidFromJson}/


Would it make more sense to pass the archive sha as a parameter the the import API?

I decided to go with saving a .sha file because it seemed to have value incase the data needs to be accessed outside Dataverse. Dataverse on its end expects the file to be there.

pameyer · 2018-11-30T20:59:15Z

scn/post_upload_s3.bash

+			aws s3 cp ${packageName}.sha s3://${S3HOLD}/${ulidFromJson}/

 			err=$?
 			if (( $err != 0 )) ; then


This error check is will only catch problems moving ${packageName}.sha to s3, and will miss errors coping ${packageName}.${packageExt} to s3 - which is equally or more important.

pameyer · 2018-11-30T21:03:25Z

scn/post_upload_s3.bash


-			#This may prove to be slow with large datasets
-			sz=`aws s3 ls --summarize --human-readable --recursive s3://${S3HOLD}/${ulidFromJson}/ | grep "Total Size: " | cut -d' ' -f 6`
+			sz=`aws s3 ls --summarize --human-readable s3://${S3HOLD}/${ulidFromJson}/$packageName | grep "Total Size: " | cut -d' ' -f 6`


should this be ${packageName}.${packageExtension}?

pameyer · 2018-11-30T21:05:53Z

scn/post_upload_s3.bash

@@ -68,24 +68,38 @@ do
 		#move to HOLD location



If I'm remembering correctly, there was some discussion about having archive / not archive S3 DCM be a DCM-side configuration option.

Am I remembering right?

Do we still care?

If both, is it still worth the incremental increase in complexity (vs "S3 DCM gives you archives w\ no in-place compute")?

That was on the table for a while but I have a pretty strong memory that we switched to only do zips.

matthew-a-dunlap added 2 commits November 1, 2018 16:27

Zip package and create package sha

6b0c590

install zip for s3package, comment clean

b51a6c6

matthew-a-dunlap requested a review from pameyer November 30, 2018 17:58

Merge branch 'master' into s3_package_zip

b90e272

pameyer requested changes Nov 30, 2018

View reviewed changes

This was referenced Nov 30, 2018

4949 download package s3 IQSS/dataverse#5359

Merged

Download Package File from S3 IQSS/dataverse#4949

Closed

matthew-a-dunlap added 2 commits December 4, 2018 14:51

S3 PR Fixes, error handling etc

18b2a10

Forgotten rpm.spec for s3

925f3a2

pameyer merged commit 925f3a2 into master Dec 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 package zip#35

S3 package zip#35
pameyer merged 5 commits intomasterfrom
s3_package_zip

matthew-a-dunlap commented Nov 30, 2018

Uh oh!

pameyer Nov 30, 2018

Uh oh!

pameyer Nov 30, 2018

Uh oh!

pameyer Nov 30, 2018

Uh oh!

pameyer Nov 30, 2018

Uh oh!

pameyer Nov 30, 2018

Uh oh!

matthew-a-dunlap Dec 4, 2018 •

edited

Loading

Uh oh!

pameyer Nov 30, 2018

Uh oh!

pameyer Nov 30, 2018

Uh oh!

pameyer Nov 30, 2018

Uh oh!

matthew-a-dunlap Nov 30, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matthew-a-dunlap commented Nov 30, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthew-a-dunlap Dec 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthew-a-dunlap Dec 4, 2018 •

edited

Loading