Skip to content

Clickhouse with Iceberg does not handle partition value containing "/" in the partition key value #1348

@jcdauchy-moodys

Description

@jcdauchy-moodys

Describe the bug
Clickhouse should handle correctly Iceberg tables partitioned by a column containing "/" in its values.

What's happening:

  • S3 object exists at: partition_key=dev%2Fapp1%2Fservice1/ (URL-encoded)
  • ClickHouse looks for: partition_key=dev/app1/service1/ (URL-decoded)
  • S3 returns 404 because the decoded path doesn't exist

This confirms ClickHouse is URL-decoding partition paths incorrectly.

To Reproduce
python3 test-partition-bug.py

1. Upload to S3:

aws s3 cp test-data-with-slash.parquet s3://<BUCKET_PATH>/testData/test-data/ --region eu-west-1

2. Create Iceberg table with ICE:

ice create-table my_schema.partition_bug_test
--schema-from-parquet s3://<BUCKET_PATH>/testData/test-data/test-data-with-slash.parquet
--partition '[{"column":"partition_key","transform":"identity"}]'

3. Insert data with ICE (creates hierarchical partitioning):

ice insert my_schema.partition_bug_test
s3://<BUCKET_PATH>/testData/test-data/test-data-with-slash.parquet

4. Check S3 structure (you'll see URL-encoded paths):

aws s3 ls s3://<BUCKET_PATH>/iceberg_catalog/my_schema/partition_bug_test/ --recursive

2026-01-27 17:29:23 1172 alm-liq/iceberg_catalog/my_schema/partition_bug_test/data/partition_key=dev%2Fapp1%2Fservice1/1769531359274-ded0bbf7d5aa003a6f6c6408f98489ae9cb8703d2df9c0a95c15c6121b6c7033-part.parquet
2026-01-27 17:29:23 1178 alm-liq/iceberg_catalog/my_schema/partition_bug_test/data/partition_key=prod%2Fapp1%2Fservice1/1769531359274-ded0bbf7d5aa003a6f6c6408f98489ae9cb8703d2df9c0a95c15c6121b6c7033-part.parquet
2026-01-27 17:29:23 1120 alm-liq/iceberg_catalog/my_schema/partition_bug_test/data/partition_key=prod%2Fapp2%2Fservice2/1769531359274-ded0bbf7d5aa003a6f6c6408f98489ae9cb8703d2df9c0a95c15c6121b6c7033-part.parquet
2026-01-27 17:27:43 1170 alm-liq/iceberg_catalog/my_schema/partition_bug_test/metadata/00000-e66b4081-896c-42ff-9438-5213d47f5784.metadata.json
2026-01-27 17:29:24 2196 alm-liq/iceberg_catalog/my_schema/partition_bug_test/metadata/00001-8b5c0ae9-beea-43ff-bbbe-9b43a7ba801e.metadata.json
2026-01-27 17:29:23 7528 alm-liq/iceberg_catalog/my_schema/partition_bug_test/metadata/ea8af7da-ea2a-49c6-84ba-e756cf6a58ae-m0.avro
2026-01-27 17:29:23 4513 alm-liq/iceberg_catalog/my_schema/partition_bug_test/metadata/snap-1778817764108029085-1-ea8af7da-ea2a-49c6-84ba-e756cf6a58ae.avro

  1. Query with ClickHouse (THIS WILL FAIL):

SET send_logs_level = 'trace';
SET allow_experimental_database_iceberg = 1;

DROP DATABASE IF EXISTS almlr_polaris_catalog;

CREATE DATABASE IF NOT EXISTS almlr_polaris_catalog
ENGINE = DataLakeCatalog('http://XXXX.compute.internal/polaris/api/catalog')
SETTINGS
catalog_type = 'rest',
catalog_credential = 'xxxx:yyyyyy',
oauth_server_uri = 'http://XXXXX.compute.internal/polaris/api/catalog/v1/oauth/tokens',
warehouse = 'almlr_catalog',
vended_credentials = false;

SELECT * FROM almlr_polaris_catalog.my_schema.partition_bug_test LIMIT 5;

Expected behavior
Clickhouse should handle this case correcly and display the result of the query.

Key information
Provide relevant runtime details.

  • Project Antalya Build Version: 25.8.12.20747.altinityantalya
  • Cloud provider, e.g., AWS
  • Kubernetes provider, k3s
  • Object storage, e.g., AWS S3
  • Iceberg catalog, e.g., Polaris Rest Catalog 1.3.0

Additional context
I have provided a python script "test-partition-bug.py

log_file.txt
test-partition-bug.py

log_file.txt
test-partition-bug.py

" to create the initial parquet file with values containing "/". See log_file.txt for error codes

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions