Remove check for unavailable buckets from delete flow #368
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of your changes
Orphaned S3 buckets (i.e. S3 bucket exists on Ceph backend(s) but CR has been fully removed) have been discovered on 2/3 occasions over the past year.
Upon investigation, it turns out that in Provider Ceph's delete flow, as it iterates over the backends from which to delete a bucket, it skips over backends on which the bucket is "unavailable" (as per the CR status). The thing is that "unavailable" doesn't equate to "doesn't exist" - it just means that it's unreachable.
So a scenario could occur where a user creates a bucket while an RGW is unreachable and, as a result, Provider Ceph marks the bucket on that backend as "unavailable". But the bucket is created elsewhere so is still "ready" for the user. Then the user attempts to delete the bucket before Provider Ceph has a chance to fully sync all backends, and so this "unavailable" bucket gets skipped over during the deletion loop.
I don't know/remember why this check/skip was implemented. Either (1) it's just a mistake or (2) it was an attempt to avoid hanging CRs - if this is the case then I think hanging CRs is a much lesser evil than orphaned s3 buckets.
There is also a check and skip over a backend which no longer exists in the Status - this is valid, as a deleting an S3 bucket should also mean that it is removed from the Status.
I have:
make ready-for-reviewto ensure this PR is ready for review.make ceph-chainsawto validate these changes against Ceph. This step is not always necessary. However, for changes related to S3 calls it is sensible to validate against an actual Ceph cluster. Localstack is used in our CI Chainsaw suite for convenience and there can be disparity in S3 behaviours between it and Ceph. Seedocs/TESTING.mdfor information on how to run tests against a Ceph cluster.backport release-x.ylabels to auto-backport this PR if necessary.How has this code been tested