Skip to content

S3 package zip#35

Merged
pameyer merged 5 commits intomasterfrom
s3_package_zip
Dec 5, 2018
Merged

S3 package zip#35
pameyer merged 5 commits intomasterfrom
s3_package_zip

Conversation

@matthew-a-dunlap
Copy link
Contributor

No description provided.

FROM centos:6
# starting centos 6 build image for DCM
RUN yum install -y rpm-build python-setuptools wget rpmdevtools
RUN yum install -y rpm-build python-setuptools wget rpmdevtools zip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to the RPM spec

FROM centos:7
# starting centos 7 build image for DCM
RUN yum install -y rpm-build python-setuptools wget
RUN yum install -y rpm-build python-setuptools wget zip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also should go to rpm spec

if [ ! `aws s3 ls s3://${S3HOLD}/${ulidFromJson}/`]; then #this check is different than normal post_upload, we don't use the extra folder level
aws s3 cp --recursive ${DEPOSIT}/${ulidFolder}/${ulidFolder}/ s3://${S3HOLD}/${ulidFromJson}/ #this does not copy empty folders from DEPOSIT as folders do not actually exist in s3
packageName="package_$ulidFolder"
packageExt="zip"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packageExt should probably get moved out of the loop / conditional.


#It would be awesome to someday zip everything while it is being streamed.
echo "beginning zip of ${DEPOSIT}/${ulidFolder}/${ulidFolder}/"
zip -r $packageName ${ulidFolder}/ #There are two layers of ${ulidFolder}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be probably check the return value, and fail cleanly if zip creation has failed.


echo "test: ${DEPOSIT}/${ulidFolder}/$packageName"
aws s3 cp ${packageName}.${packageExt} s3://${S3HOLD}/${ulidFromJson}/
aws s3 cp ${packageName}.sha s3://${S3HOLD}/${ulidFromJson}/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to pass the archive sha as a parameter the the import API?

Copy link
Contributor Author

@matthew-a-dunlap matthew-a-dunlap Dec 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go with saving a .sha file because it seemed to have value incase the data needs to be accessed outside Dataverse. Dataverse on its end expects the file to be there.

aws s3 cp ${packageName}.sha s3://${S3HOLD}/${ulidFromJson}/

err=$?
if (( $err != 0 )) ; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error check is will only catch problems moving ${packageName}.sha to s3, and will miss errors coping ${packageName}.${packageExt} to s3 - which is equally or more important.


#This may prove to be slow with large datasets
sz=`aws s3 ls --summarize --human-readable --recursive s3://${S3HOLD}/${ulidFromJson}/ | grep "Total Size: " | cut -d' ' -f 6`
sz=`aws s3 ls --summarize --human-readable s3://${S3HOLD}/${ulidFromJson}/$packageName | grep "Total Size: " | cut -d' ' -f 6`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be ${packageName}.${packageExtension}?

@@ -68,24 +68,38 @@ do
#move to HOLD location

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm remembering correctly, there was some discussion about having archive / not archive S3 DCM be a DCM-side configuration option.

  • Am I remembering right?
  • Do we still care?
  • If both, is it still worth the incremental increase in complexity (vs "S3 DCM gives you archives w\ no in-place compute")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was on the table for a while but I have a pretty strong memory that we switched to only do zips.

@pameyer pameyer merged commit 925f3a2 into master Dec 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants