-
Notifications
You must be signed in to change notification settings - Fork 8k
Description
Company or project name
Discovered by a customer that has a data pipeline involving the creation of numerous auxiliary tables.
Describe the unexpected behaviour
During the ATTACH TABLE operation (with CREATE being a specific case), ClickHouse may check for the presence of data parts across all configured disks, even if the disk does not belong to the corresponding policy.
This presence check relies on metadata and is typically a low-cost operation; however, it is non-local for s3_plain storages.
How to reproduce
This behavior can be triggered by several conditions.
For example, if an s3_plain disk is configured as follows:
<disks>
<aws_plain>
<type>s3_plain</type>
...and if the 'default' policy is defined as:
<policies>
<default>
<volumes>
<hot>
<disk>default</disk>
</hot>
...then creating a table with this policy would lead ClickHouse to generate API calls to check for data parts on aws_plain.
Moreover, if the object storage behind aws_plain is temporarily unavailable, it would become impossible to create a table.
Which ClickHouse server version to use
Reproducible against 23.8.6.16 (official build) and against master as of October 24.
Expected behavior
Currently, I do not have a clear solution for this issue and appreciate comments/suggestions.
A potential improvement could involve introducing a flag to exclude a specific disk from this presence check. Excluding all disks with non-local metadata might also be an option, though I am unsure about its feasibility.
The logic that triggers the check seems strange, see
ClickHouse/src/Disks/StoragePolicy.cpp
Line 170 in f36bc0c
| bool StoragePolicy::isDefaultPolicy() const |