-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Describe the bug
Clickhouse should handle correctly Iceberg tables partitioned by a column containing "/" in its values.
What's happening:
- S3 object exists at:
partition_key=dev%2Fapp1%2Fservice1/(URL-encoded) - ClickHouse looks for:
partition_key=dev/app1/service1/(URL-decoded) - S3 returns 404 because the decoded path doesn't exist
This confirms ClickHouse is URL-decoding partition paths incorrectly.
To Reproduce
python3 test-partition-bug.py
1. Upload to S3:
aws s3 cp test-data-with-slash.parquet s3://<BUCKET_PATH>/testData/test-data/ --region eu-west-1
2. Create Iceberg table with ICE:
ice create-table my_schema.partition_bug_test
--schema-from-parquet s3://<BUCKET_PATH>/testData/test-data/test-data-with-slash.parquet
--partition '[{"column":"partition_key","transform":"identity"}]'
3. Insert data with ICE (creates hierarchical partitioning):
ice insert my_schema.partition_bug_test
s3://<BUCKET_PATH>/testData/test-data/test-data-with-slash.parquet
4. Check S3 structure (you'll see URL-encoded paths):
aws s3 ls s3://<BUCKET_PATH>/iceberg_catalog/my_schema/partition_bug_test/ --recursive
2026-01-27 17:29:23 1172 alm-liq/iceberg_catalog/my_schema/partition_bug_test/data/partition_key=dev%2Fapp1%2Fservice1/1769531359274-ded0bbf7d5aa003a6f6c6408f98489ae9cb8703d2df9c0a95c15c6121b6c7033-part.parquet
2026-01-27 17:29:23 1178 alm-liq/iceberg_catalog/my_schema/partition_bug_test/data/partition_key=prod%2Fapp1%2Fservice1/1769531359274-ded0bbf7d5aa003a6f6c6408f98489ae9cb8703d2df9c0a95c15c6121b6c7033-part.parquet
2026-01-27 17:29:23 1120 alm-liq/iceberg_catalog/my_schema/partition_bug_test/data/partition_key=prod%2Fapp2%2Fservice2/1769531359274-ded0bbf7d5aa003a6f6c6408f98489ae9cb8703d2df9c0a95c15c6121b6c7033-part.parquet
2026-01-27 17:27:43 1170 alm-liq/iceberg_catalog/my_schema/partition_bug_test/metadata/00000-e66b4081-896c-42ff-9438-5213d47f5784.metadata.json
2026-01-27 17:29:24 2196 alm-liq/iceberg_catalog/my_schema/partition_bug_test/metadata/00001-8b5c0ae9-beea-43ff-bbbe-9b43a7ba801e.metadata.json
2026-01-27 17:29:23 7528 alm-liq/iceberg_catalog/my_schema/partition_bug_test/metadata/ea8af7da-ea2a-49c6-84ba-e756cf6a58ae-m0.avro
2026-01-27 17:29:23 4513 alm-liq/iceberg_catalog/my_schema/partition_bug_test/metadata/snap-1778817764108029085-1-ea8af7da-ea2a-49c6-84ba-e756cf6a58ae.avro
- Query with ClickHouse (THIS WILL FAIL):
SET send_logs_level = 'trace';
SET allow_experimental_database_iceberg = 1;
DROP DATABASE IF EXISTS almlr_polaris_catalog;
CREATE DATABASE IF NOT EXISTS almlr_polaris_catalog
ENGINE = DataLakeCatalog('http://XXXX.compute.internal/polaris/api/catalog')
SETTINGS
catalog_type = 'rest',
catalog_credential = 'xxxx:yyyyyy',
oauth_server_uri = 'http://XXXXX.compute.internal/polaris/api/catalog/v1/oauth/tokens',
warehouse = 'almlr_catalog',
vended_credentials = false;
SELECT * FROM almlr_polaris_catalog.my_schema.partition_bug_test LIMIT 5;
Expected behavior
Clickhouse should handle this case correcly and display the result of the query.
Key information
Provide relevant runtime details.
- Project Antalya Build Version: 25.8.12.20747.altinityantalya
- Cloud provider, e.g., AWS
- Kubernetes provider, k3s
- Object storage, e.g., AWS S3
- Iceberg catalog, e.g., Polaris Rest Catalog 1.3.0
Additional context
I have provided a python script "test-partition-bug.py
log_file.txt
test-partition-bug.py
log_file.txt
test-partition-bug.py
" to create the initial parquet file with values containing "/". See log_file.txt for error codes