Bugfix GCSToGCSOperator when copy an object without wildcard and exact_match=True #32376
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes the following case.
The goal is to copy
source/foo.txttodest/foo.txtwithin a single GCS bucket.Expected bucket state:
Actual (incorrect) bucket state:
======================================================
The reason for this bug was the lack of handling
exact_match=Truewhen objects are being copied without a wildcard. This problem is fixed in the current PR.======================================================
However, if the flag is set to its default value
exact_match=False, then the operator's result is different:It's actually correct, because in general
source_object="path/to/the/file.txt"is not treated as a file path, but as an object name prefix (doc). That's why the prefixsource_object="path/to/the/file.txt"corresponds to both objects:And if the destination_object is set, then the destination object prefix is just built as a concatenation of the source prefix and the destination prefix. There is no difference for GCS what is being copied: a file or a folder - both of these entities are the same things - objects.
Perhaps, it makes sense to implement more "human friendly" logic, so the operator would act with inputs as with files and folders, but I think it should be another operator, because
GCSToGCSOperator's current implementation became too complicated for major changes. This is just my thoughts, I'm not insisting.