Skip to content

Swarm join of iceberg tables fails with "Cannot find manifest file for data file error" or returns no data #1063

@timoha

Description

@timoha

Describe the bug
Getting an exception when joining two iceberg('<path>') tables.

2025.10.05 23:05:00.041609 [ 85 ] {} <Error> TCPHandler: Code: 36. DB::Exception: Cannot find manifest file for data file: tables/<fact_table>/data/as_of_date=2023-10-15/00015-198976-e06ac989-671b-4ff5-a42a-c21791e0a5bc-0-00001.parquet: While executing IcebergS3(_table_function.iceberg)ReadStep. (BAD_ARGUMENTS), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000fc10edb
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009db0eec
2. DB::Exception::Exception<String&>(int, FormatStringHelperImpl<std::type_identity<String&>::type>, String&) @ 0x0000000009dc6acb
3. DB::IcebergMetadata::getInitialSchemaByPath(std::shared_ptr<DB::Context const>, String const&) const @ 0x0000000012853724
4. DB::DataLakeConfiguration<DB::StorageS3Configuration, DB::IcebergMetadata>::getInitialSchemaByPath(std::shared_ptr<DB::Context const>, String const&) const @ 0x0000000011bb9745
5. DB::StorageIcebergConfiguration::getInitialSchemaByPath(std::shared_ptr<DB::Context const>, String const&) const @ 0x0000000011bb2dfe
6. DB::StorageObjectStorageSource::createReader(unsigned long, std::shared_ptr<DB::IObjectIterator> const&, std::shared_ptr<DB::StorageObjectStorage::Configuration> const&, std::shared_ptr<DB::IObjectStorage> const&, DB::ReadFromFormatInfo&, std::optional<DB::FormatSettings> const&, std::shared_ptr<DB::KeyCondition const> const&, std::shared_ptr<DB::Context const> const&, DB::SchemaCache*, std::shared_ptr<Poco::Logger> const&, unsigned long, unsigned long, bool) @ 0x000000001274a2b4
7. DB::StorageObjectStorageSource::generate() @ 0x0000000012748213
8. DB::ISource::tryGenerate() @ 0x00000000156b7cde
9. DB::ISource::work() @ 0x00000000156b78e7
10. DB::ExecutionThreadContext::executeTask() @ 0x00000000156d3ee2
11. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x00000000156c79e5
12. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreadsImpl(std::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x00000000156ca1df
13. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() @ 0x000000000fd588cb
14. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000fd5f4dd
15. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x000000000fd55af2
16. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000000fd5cfba
17. ? @ 0x0000000000094ac3
18. ? @ 0x0000000000126850

Using the following query:

select * from iceberg('<path_to_fact_table>', SETTINGS iceberg_metadata_file_path = 'metadata/24762-eaccbd6f-278c-498a-955b-a51b5d7cb4c2.metadata.json') sm
join iceberg('<path_to_join_table>', SETTINGS iceberg_metadata_file_path = 'metadata/05240-05e496d6-618a-4a4f-9846-a387d3f30d85.metadata.json') e on sm.id = e.id

We have 8 swarm nodes, the exception happens on a random file and random node each time.
Have no issue querying it with Spark or Athena, so unlikely anything in S3 is corrupted
iceberg_metadata_file_path is set explicitly to bypass Glue catalog resolution.
The tables are written using Spark. ClickHouse has read only permissions to S3.

Interestingly, that the following query does not return an error, so it's only thrown on direct join:

select * from (select * from iceberg('<path_to_fact_table>', SETTINGS iceberg_metadata_file_path = 'metadata/24762-eaccbd6f-278c-498a-955b-a51b5d7cb4c2.metadata.json')) sm
join (select * from iceberg('<path_to_join_table>', SETTINGS iceberg_metadata_file_path = 'metadata/05240-05e496d6-618a-4a4f-9846-a387d3f30d85.metadata.json')) e
on sm.id = e.id

Settings are as follows, but only "object_storage_cluster": "swarm" is responsible for the exception (tried setting all the rest to 0):

        "object_storage_cluster": "swarm",
        "input_format_parquet_use_metadata_cache": "1",
        "enable_filesystem_cache": "1",
        "use_iceberg_metadata_files_cache": "1",
        "use_hive_partitioning": "1",
        "date_time_overflow_behavior": "saturate",
        "use_object_storage_list_objects_cache": "1",
        "use_iceberg_partition_pruning": "1",
        "max_memory_usage": str(10 * 1024 * 1024 * 1024),  # 10 GB
        "max_bytes_before_external_group_by": str(10 * 1024 * 1024 * 1024),  # 10 GB
        "max_bytes_before_external_sort": str(10 * 1024 * 1024 * 1024),  # 10 GB

More alarmingly, swarm mode causes join to return no rows for joins of some other iceberg tables (instead of an exception). Expected rows are returned when swarm mode is turned off.

select *
from iceberg('s3://<reducted>/tables/test_table_glue_clickhouse_ts', SETTINGS iceberg_metadata_file_path = 'metadata/00000-3e32191c-0e4f-40a1-a40d-b1c5e40df43a.metadata.json') t1
join iceberg('s3://<reducted>/tables/test_table_glue_clickhouse_no_ts', SETTINGS iceberg_metadata_file_path = 'metadata/00000-1404b85d-fa83-45a8-a4c3-4eaa285c2270.metadata.json') t2 on t1.a = t2.a

Expected behavior
Join on iceberg table doesn't throw and exception and returns rows.

Key information
Provide relevant runtime details.

  • 25.6.5.20363.altinityantalya
  • AWS S3
  • Spark written iceberg tables

Additional context
Initially brought up in issue #887

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions