Remove check for unavailable buckets from delete flow #368

nolancon · 2025-11-24T10:31:04Z

Description of your changes

Orphaned S3 buckets (i.e. S3 bucket exists on Ceph backend(s) but CR has been fully removed) have been discovered on 2/3 occasions over the past year.

Upon investigation, it turns out that in Provider Ceph's delete flow, as it iterates over the backends from which to delete a bucket, it skips over backends on which the bucket is "unavailable" (as per the CR status). The thing is that "unavailable" doesn't equate to "doesn't exist" - it just means that it's unreachable.
So a scenario could occur where a user creates a bucket while an RGW is unreachable and, as a result, Provider Ceph marks the bucket on that backend as "unavailable". But the bucket is created elsewhere so is still "ready" for the user. Then the user attempts to delete the bucket before Provider Ceph has a chance to fully sync all backends, and so this "unavailable" bucket gets skipped over during the deletion loop.

I don't know/remember why this check/skip was implemented. Either (1) it's just a mistake or (2) it was an attempt to avoid hanging CRs - if this is the case then I think hanging CRs is a much lesser evil than orphaned s3 buckets.

There is also a check and skip over a backend which no longer exists in the Status - this is valid, as a deleting an S3 bucket should also mean that it is removed from the Status.

I have:

Run make ready-for-review to ensure this PR is ready for review.
Run make ceph-chainsaw to validate these changes against Ceph. This step is not always necessary. However, for changes related to S3 calls it is sensible to validate against an actual Ceph cluster. Localstack is used in our CI Chainsaw suite for convenience and there can be disparity in S3 behaviours between it and Ceph. See docs/TESTING.md for information on how to run tests against a Ceph cluster.
Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

Copilot

Pull request overview

This PR fixes a critical issue where S3 buckets could be orphaned on backends during deletion when those backends were temporarily unreachable. The deletion logic previously skipped backends marked as "unavailable," which meant buckets on temporarily unreachable backends wouldn't be deleted, leading to orphaned resources.

Key Changes:

Removed the check that skipped deletion attempts on "unavailable" backends, allowing deletion to proceed regardless of backend availability status
Added comprehensive e2e test coverage for the scenario of disabling/deleting buckets when backends are unhealthy

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
internal/controller/bucket/delete.go	Removed the condition that skipped deletion on unavailable backends, ensuring all backends are attempted during deletion
e2e/tests/stable/chainsaw-test.yaml	Added test cases verifying bucket disablement and deletion behavior when backends are unhealthy and then recover

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

e2e/tests/stable/chainsaw-test.yaml

nolancon added 2 commits November 24, 2025 11:07

Remove check for unavilable backends during deletion

2d51d25

Add test case for disablement on an unhealthy backend

1afd535

nolancon force-pushed the deletion-remove-skips branch from a64da7f to 1afd535 Compare November 24, 2025 11:07

nolancon changed the title ~~Remove skips/checks from delete flow~~ Remove check for unavailable buckets from delete flow Nov 24, 2025

nolancon requested a review from Copilot November 24, 2025 11:15

Copilot AI reviewed Nov 24, 2025

View reviewed changes

e2e/tests/stable/chainsaw-test.yaml Show resolved Hide resolved

e2e/tests/stable/chainsaw-test.yaml Show resolved Hide resolved

nolancon marked this pull request as ready for review November 24, 2025 11:18

nolancon assigned Shunpoco Nov 24, 2025

Shunpoco approved these changes Nov 25, 2025

View reviewed changes

nolancon merged commit 11cb232 into main Nov 26, 2025
10 checks passed

nolancon deleted the deletion-remove-skips branch November 26, 2025 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove check for unavailable buckets from delete flow #368

Remove check for unavailable buckets from delete flow #368

Uh oh!

nolancon commented Nov 24, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove check for unavailable buckets from delete flow #368

Remove check for unavailable buckets from delete flow #368

Uh oh!

Conversation

nolancon commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of your changes

How has this code been tested

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nolancon commented Nov 24, 2025 •

edited

Loading