(maybe) Fix zero‑copy unlock check before part dir move #1353
+5
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Make zero‑copy cleanup check the correct directory before it is moved, so shared blobs aren’t deleted while other replicas still need them ( regression after ClickHouse#94262 ).
Documentation entry for user-facing changes
(I hope) fixes #1338
What went wrong
A data part is removed in two steps:
In PR ClickHouse#94262, we stopped updating the internal path (part_dir) after step 1 to fix a data race with system.parts.
That was correct for the race, but it had a side effect:
Result: a replica can delete blobs that other replicas still need, causing fetch failures and permanent missing parts.
What the fix does
Before we rename the directory, we now evaluate the zero‑copy delete decision and keep the result.
That way:
DataPartStorageOnDiskBase::removevssystem.partsClickHouse/ClickHouse#94262 remains intact.Why 94262 introduced the regression
94262 removed part_dir updates during removal to fix a data race with system.parts.
That means any logic that depends on part_dir after the directory is renamed will now read a stale path.
Zero‑copy cleanup is exactly such logic.
Why only zero‑copy is affected
Non‑zero‑copy disks don’t use refcount files or ZooKeeper locks.
The cleanup simply deletes files and never checks shared blob ownership.
So only zero‑copy paths care about that missing references file.
CI/CD Options
Exclude tests:
Regression jobs to run: