Conversation
rpm/dep/c6build.dockerfile
Outdated
| FROM centos:6 | ||
| # starting centos 6 build image for DCM | ||
| RUN yum install -y rpm-build python-setuptools wget rpmdevtools | ||
| RUN yum install -y rpm-build python-setuptools wget rpmdevtools zip |
There was a problem hiding this comment.
This should be moved to the RPM spec
rpm/dep/c7build.dockerfile
Outdated
| FROM centos:7 | ||
| # starting centos 7 build image for DCM | ||
| RUN yum install -y rpm-build python-setuptools wget | ||
| RUN yum install -y rpm-build python-setuptools wget zip |
scn/post_upload_s3.bash
Outdated
| if [ ! `aws s3 ls s3://${S3HOLD}/${ulidFromJson}/`]; then #this check is different than normal post_upload, we don't use the extra folder level | ||
| aws s3 cp --recursive ${DEPOSIT}/${ulidFolder}/${ulidFolder}/ s3://${S3HOLD}/${ulidFromJson}/ #this does not copy empty folders from DEPOSIT as folders do not actually exist in s3 | ||
| packageName="package_$ulidFolder" | ||
| packageExt="zip" |
There was a problem hiding this comment.
packageExt should probably get moved out of the loop / conditional.
|
|
||
| #It would be awesome to someday zip everything while it is being streamed. | ||
| echo "beginning zip of ${DEPOSIT}/${ulidFolder}/${ulidFolder}/" | ||
| zip -r $packageName ${ulidFolder}/ #There are two layers of ${ulidFolder} |
There was a problem hiding this comment.
This should be probably check the return value, and fail cleanly if zip creation has failed.
scn/post_upload_s3.bash
Outdated
|
|
||
| echo "test: ${DEPOSIT}/${ulidFolder}/$packageName" | ||
| aws s3 cp ${packageName}.${packageExt} s3://${S3HOLD}/${ulidFromJson}/ | ||
| aws s3 cp ${packageName}.sha s3://${S3HOLD}/${ulidFromJson}/ |
There was a problem hiding this comment.
Would it make more sense to pass the archive sha as a parameter the the import API?
There was a problem hiding this comment.
I decided to go with saving a .sha file because it seemed to have value incase the data needs to be accessed outside Dataverse. Dataverse on its end expects the file to be there.
| aws s3 cp ${packageName}.sha s3://${S3HOLD}/${ulidFromJson}/ | ||
|
|
||
| err=$? | ||
| if (( $err != 0 )) ; then |
There was a problem hiding this comment.
This error check is will only catch problems moving ${packageName}.sha to s3, and will miss errors coping ${packageName}.${packageExt} to s3 - which is equally or more important.
scn/post_upload_s3.bash
Outdated
|
|
||
| #This may prove to be slow with large datasets | ||
| sz=`aws s3 ls --summarize --human-readable --recursive s3://${S3HOLD}/${ulidFromJson}/ | grep "Total Size: " | cut -d' ' -f 6` | ||
| sz=`aws s3 ls --summarize --human-readable s3://${S3HOLD}/${ulidFromJson}/$packageName | grep "Total Size: " | cut -d' ' -f 6` |
There was a problem hiding this comment.
should this be ${packageName}.${packageExtension}?
| @@ -68,24 +68,38 @@ do | |||
| #move to HOLD location | |||
|
|
|||
There was a problem hiding this comment.
If I'm remembering correctly, there was some discussion about having archive / not archive S3 DCM be a DCM-side configuration option.
- Am I remembering right?
- Do we still care?
- If both, is it still worth the incremental increase in complexity (vs "S3 DCM gives you archives w\ no in-place compute")?
There was a problem hiding this comment.
That was on the table for a while but I have a pretty strong memory that we switched to only do zips.
No description provided.