Skip to content

Improved robustness of recycler script#3657

Merged
openshift-bot merged 1 commit intoopenshift:masterfrom
markturansky:recyc_img_improvement
Sep 27, 2015
Merged

Improved robustness of recycler script#3657
openshift-bot merged 1 commit intoopenshift:masterfrom
markturansky:recyc_img_improvement

Conversation

@markturansky
Copy link
Member

Resolves #3410

The recycler script needs to guarantee fail if the volume could not be scrubbed for any reason. This revision is better at doing so, but I am a Bash newb and could use a review by someone with good scripting skills.

@smarterclayton

@smarterclayton
Copy link
Contributor

@sdodson on the bash

On Jul 10, 2015, at 11:48 AM, Mark Turansky notifications@github.com
wrote:

The recycler script needs to guarantee fail if the volume could not be
scrubbed for any reason. This revision is better at doing so, but I am a
Bash newb and could use a review by someone with good scripting skills.

@smarterclayton https://github.com/smarterclayton

You can view, comment on, or merge this pull request online at:

#3657
Commit Summary

  • improved robustness of recycler script

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#3657.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the man page the return code for rm should be non zero if there were failures, why not just rely on that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed the script to rely on rm's exit code, as you suggest. I had to remove "set -o errexit", which I thought was the standard bash habit. The script is a little clearer now, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to remove set errexit, but you'll want to put the rm inside a
test block and handle it being empty

On Jul 13, 2015, at 2:43 PM, Mark Turansky notifications@github.com wrote:

In images/origin/scripts/recycler.sh
#3657 (comment):

+# first and only arg is the directory to scrub
+dir=${1:-}
+
+if [ -z $dir ]; then

  • echo "Usage: recycler.sh some/path/to/scrub"
  • exit 1
    +fi

+echo "Scrubbing $dir"
+
+# capture the output, which can contain permission denied errors
+rm_output=$(rm -rf $dir/* 2>&1 || true)
+
+# silence is golden
+if [ -z "$rm_output" ]; then

I just changed the script to rely on rm's exit code, as you suggest. I had
to remove "set -o errexit", which I thought was the standard bash habit.
The script is a little clearer now, though.


Reply to this email directly or view it on GitHub
https://github.com/openshift/origin/pull/3657/files#r34496567.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that how I originally had it? Use errexit and capture the output and test its output for emptiness?

@markturansky markturansky force-pushed the recyc_img_improvement branch from c8b74ed to b252b03 Compare July 20, 2015 19:14
@markturansky
Copy link
Member Author

@smarterclayton 2 of 3 improvements implemented. Thanks for the feedback. It was helpful.

The one about rm was commented on above.

@markturansky
Copy link
Member Author

@smarterclayton is there more to do with this bash script? We can tweak it more, but this one is already better than the one that exists today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing quotes and eagle braces

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented inline about this one, but the comment block above got collapsed.

rm doesn't support wildcard expansion, so $dir/* returned true (but removed nothing) and the script would wrongly pass.

#3657 (comment)

@markturansky markturansky force-pushed the recyc_img_improvement branch from b252b03 to aa30eac Compare July 30, 2015 14:22
@danmcp
Copy link

danmcp commented Aug 10, 2015

@markturansky Bump

@markturansky markturansky force-pushed the recyc_img_improvement branch from 380cdde to 46e3ce3 Compare August 10, 2015 20:02
@markturansky
Copy link
Member Author

This is awaiting review. I just squashed to make it easier.

I added deletion of dotfiles to the scrub script and Recycle as the policy on the Wordpress example PVs. Just ran through the whole thing in newly rebased Origin as a test and it's good to go.

@markturansky
Copy link
Member Author

This requires attention and should receive priority for a release. Ignoring dotfiles in the recycler means many volumes might not successfully be recycled.

This change fixes that and works when I run the wordpress example with "--latest-images"

@markturansky
Copy link
Member Author

@bparees This PR might be relevant to your group. It is an Origin image.

Someone needs to be assigned to this PR and get it merged.

@bparees
Copy link
Contributor

bparees commented Sep 11, 2015

@mnagy @rhcarvalho ptal, we might want to do this in our persistent db templates?

@markturansky should this be the default behavior for PVCs instead of having to specify it?

@markturansky
Copy link
Member Author

@bparees The built-in Kube default will be different. I override the image already and have VolumeConfig patched into OpenShift.

This PR just improves that image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

shopt -s dotglob nullglob
rm -rf ${dir}/*

?

For the current code we have there, if there's no dot file in $dir, here's what we get:

$ mkdir b
$ rm -rf b/.*
rm: refusing to remove ‘.’ or ‘..’ directory: skipping ‘b/.’
rm: refusing to remove ‘.’ or ‘..’ directory: skipping ‘b/..’
$ echo $?
1

@rhcarvalho
Copy link
Contributor

@markturansky please consider:

set -e # same as set -o errexit

shopt -s dotglob nullglob

if [[ rm -rf ${dir}/* ]]; then
  echo 'Scrub OK'
  exit 0
fi

echo 'Scrub failed'
exit 1

@rhcarvalho
Copy link
Contributor

we might want to do this in our persistent db templates?

@bparees you mean make our templates use persistentVolumeReclaimPolicy: Recycle?
I just read the docs, and I don't think we should delete data by default...

@bparees
Copy link
Contributor

bparees commented Sep 11, 2015

@rhcarvalho if you've released the claim i'm not sure what expectation you should have to get it back later.

so yes, i'm suggesting delete by default.

@rhcarvalho
Copy link
Contributor

Does deleting a PVC mean releasing the claim?
How does one explicitly release a claim?

@markturansky
Copy link
Member Author

The volume is released from its claim when the claim is deleted. That is the trigger to recycle the volume.

@rhcarvalho
Copy link
Contributor

@bparees I disagree with deleting data by default. My work flow when I'm working on the DB images is to create a PV, then instantiate a template (including Service, DC, PVC), do some work, write data, etc. Later I might put everything down with oc delete all,pv,pvc --all. I don't want my data to be gone.
I want to be able to take it and put it back up in a new setup.
I think that matches most people's expectations.

Cluster admins could decide at some point that stale data on unclaimed PVs get recycled, but that should happen after a grace period...

Recycle might work fine for small demos that we don't care about the data. It will drive people mad when they discover all of their data is gone unintentionally.

@markturansky
Copy link
Member Author

@rhcarvalho if that's your use case, you should not delete the claim.

If you delete your claim, you cannot easily get it back. An admin could manually manipulate the data and bind the same PV to your new PVClaim, but I wouldn't think this is likely.

@rhcarvalho
Copy link
Contributor

@markturansky our templates include a PVC, it would be very clumsy to start over from the same template for a second, third time, without deleting the old PVC...

@markturansky
Copy link
Member Author

I understand. I think @bparees is correct, then. Might as well recycle those volumes. You're not going to get them back (easily) if the claim is deleted. The resource might as well be freed for another user.

@bparees
Copy link
Contributor

bparees commented Sep 11, 2015

@rhcarvalho you're basically taking advantage of a loophole. in fact personally if i deleted a claim and then recreated it and got old data back, i'd be pretty surprised.

i think you need to consider a different workfow if you want your volume data to stick around. after all, if you had 3 volumes, what guarantee do you have that you'd get back the one you wanted w/ your next claim?

@markturansky
Copy link
Member Author

@rhcarvalho thanks for the feedback on the script. I'll have it implemented and tested for you by Monday.

@rhcarvalho
Copy link
Contributor

Hmmm... I'm not trying to reuse the PV, but the data... as I mentioned, I'm running oc delete all,pv,pvc --all, which removes both PV and PVC. But the data is still there. Then I can create a new PV from the same NFS share or hostPath.

after all, if you had 3 volumes, what guarantee do you have that you'd get back the one you wanted w/ your next claim?

I don't necessarily want to have the data back when I create a new PV and PVC and the rest of the template. It's not about doing this rematching, but about having the data there.
If I do have data in my PV is because I manually created a PV pointing to existing data. What I dislike about rm -rf in a volume that was used for a database is that when the user come and ask how to get his precious data back we'll say "yeah, you better had a backup somewhere else, we've deleted your data permanently".

For a production database I'd rather spare an extra PV if I needed rather than recycling...

Sorry for the noise in this PR. We should probably be discussing this somewhere else. @bparees we can talk about it over email or IRC next week.

@mrry550
Copy link

mrry550 commented Sep 24, 2015

@markturansky @rhcarvalho @bparees - just following up on this PR as it has been pending for the storage team for awhile. Is there something left to be done here, or can this get merged?

@markturansky
Copy link
Member Author

I owe this PR the suggested changes. I will follow up shortly. Sorry for the delay.

@markturansky
Copy link
Member Author

@rhcarvalho I added the bash script exactly as you posted it above. It looks simple enough, and simple is good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep this line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d'oh! cut and paste fail.

@markturansky
Copy link
Member Author

./images/origin/scripts/recycler.sh: line 5: conditional binary operator expected

@rhcarvalho
Copy link
Contributor

My example above was not supposed to replace the whole file :)

@markturansky
Copy link
Member Author

@rhcarvalho I added back what seems to be correct but still get the same error as posted above.

I am, literally, a bash newb and have no idea how to script this.

@rhcarvalho
Copy link
Contributor

@markturansky I can give you a hand. Give me some minutes :-)

@rhcarvalho
Copy link
Contributor

@markturansky I've opened a PR against your repo - markturansky#1

@markturansky
Copy link
Member Author

@rhcarvalho thank you so much for your help. Your script is perfect. I tested through it locally and its good to go.

@bparees @smarterclayton final review and merge, please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to miss . (dot) directories. eg .config

also why not just
rm -rfv ${dir}/.* ${dir}/*

?

i guess that returns an error code because it fails to delete "." and "..", so maybe you need to incorporate it into the find statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, cool.

@bparees
Copy link
Contributor

bparees commented Sep 27, 2015

[merge]

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_origin/3407/) (Image: devenv-fedora_2411)

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 5fcff7d

openshift-bot pushed a commit that referenced this pull request Sep 27, 2015
@openshift-bot openshift-bot merged commit 347cf33 into openshift:master Sep 27, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants