Swarm join of iceberg tables fails with "Cannot find manifest file for data file error" or returns no data

**Describe the bug**
Getting an exception when joining two `iceberg('<path>')` tables.

```
2025.10.05 23:05:00.041609 [ 85 ] {} <Error> TCPHandler: Code: 36. DB::Exception: Cannot find manifest file for data file: tables/<fact_table>/data/as_of_date=2023-10-15/00015-198976-e06ac989-671b-4ff5-a42a-c21791e0a5bc-0-00001.parquet: While executing IcebergS3(_table_function.iceberg)ReadStep. (BAD_ARGUMENTS), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000fc10edb
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000009db0eec
2. DB::Exception::Exception<String&>(int, FormatStringHelperImpl<std::type_identity<String&>::type>, String&) @ 0x0000000009dc6acb
3. DB::IcebergMetadata::getInitialSchemaByPath(std::shared_ptr<DB::Context const>, String const&) const @ 0x0000000012853724
4. DB::DataLakeConfiguration<DB::StorageS3Configuration, DB::IcebergMetadata>::getInitialSchemaByPath(std::shared_ptr<DB::Context const>, String const&) const @ 0x0000000011bb9745
5. DB::StorageIcebergConfiguration::getInitialSchemaByPath(std::shared_ptr<DB::Context const>, String const&) const @ 0x0000000011bb2dfe
6. DB::StorageObjectStorageSource::createReader(unsigned long, std::shared_ptr<DB::IObjectIterator> const&, std::shared_ptr<DB::StorageObjectStorage::Configuration> const&, std::shared_ptr<DB::IObjectStorage> const&, DB::ReadFromFormatInfo&, std::optional<DB::FormatSettings> const&, std::shared_ptr<DB::KeyCondition const> const&, std::shared_ptr<DB::Context const> const&, DB::SchemaCache*, std::shared_ptr<Poco::Logger> const&, unsigned long, unsigned long, bool) @ 0x000000001274a2b4
7. DB::StorageObjectStorageSource::generate() @ 0x0000000012748213
8. DB::ISource::tryGenerate() @ 0x00000000156b7cde
9. DB::ISource::work() @ 0x00000000156b78e7
10. DB::ExecutionThreadContext::executeTask() @ 0x00000000156d3ee2
11. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x00000000156c79e5
12. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<DB::PipelineExecutor::spawnThreadsImpl(std::shared_ptr<DB::IAcquiredSlot>)::$_0, void ()>>(std::__function::__policy_storage const*) @ 0x00000000156ca1df
13. ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() @ 0x000000000fd588cb
14. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000fd5f4dd
15. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x000000000fd55af2
16. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000000fd5cfba
17. ? @ 0x0000000000094ac3
18. ? @ 0x0000000000126850
```

Using the following query:
```
select * from iceberg('<path_to_fact_table>', SETTINGS iceberg_metadata_file_path = 'metadata/24762-eaccbd6f-278c-498a-955b-a51b5d7cb4c2.metadata.json') sm
join iceberg('<path_to_join_table>', SETTINGS iceberg_metadata_file_path = 'metadata/05240-05e496d6-618a-4a4f-9846-a387d3f30d85.metadata.json') e on sm.id = e.id
```

We have 8 swarm nodes, the exception happens on a random file and random node each time.
Have no issue querying it with Spark or Athena, so unlikely anything in S3 is corrupted
`iceberg_metadata_file_path` is set explicitly to bypass Glue catalog resolution.
The tables are written using Spark. ClickHouse has read only permissions to S3.

Interestingly, that the following query does not return an error, so it's only thrown on direct join:

```
select * from (select * from iceberg('<path_to_fact_table>', SETTINGS iceberg_metadata_file_path = 'metadata/24762-eaccbd6f-278c-498a-955b-a51b5d7cb4c2.metadata.json')) sm
join (select * from iceberg('<path_to_join_table>', SETTINGS iceberg_metadata_file_path = 'metadata/05240-05e496d6-618a-4a4f-9846-a387d3f30d85.metadata.json')) e
on sm.id = e.id
````


Settings are as follows, but only "object_storage_cluster": "swarm" is responsible for the exception (tried setting all the rest to `0`):
```
        "object_storage_cluster": "swarm",
        "input_format_parquet_use_metadata_cache": "1",
        "enable_filesystem_cache": "1",
        "use_iceberg_metadata_files_cache": "1",
        "use_hive_partitioning": "1",
        "date_time_overflow_behavior": "saturate",
        "use_object_storage_list_objects_cache": "1",
        "use_iceberg_partition_pruning": "1",
        "max_memory_usage": str(10 * 1024 * 1024 * 1024),  # 10 GB
        "max_bytes_before_external_group_by": str(10 * 1024 * 1024 * 1024),  # 10 GB
        "max_bytes_before_external_sort": str(10 * 1024 * 1024 * 1024),  # 10 GB
```

More alarmingly, `swarm` mode causes `join` to return no rows for joins of some other iceberg tables (instead of an exception). Expected rows are returned when swarm mode is turned off.
```
select *
from iceberg('s3://<reducted>/tables/test_table_glue_clickhouse_ts', SETTINGS iceberg_metadata_file_path = 'metadata/00000-3e32191c-0e4f-40a1-a40d-b1c5e40df43a.metadata.json') t1
join iceberg('s3://<reducted>/tables/test_table_glue_clickhouse_no_ts', SETTINGS iceberg_metadata_file_path = 'metadata/00000-1404b85d-fa83-45a8-a4c3-4eaa285c2270.metadata.json') t2 on t1.a = t2.a
```


**Expected behavior**
Join on iceberg table doesn't throw and exception and returns rows.


**Key information**
Provide relevant runtime details. 
 - `25.6.5.20363.altinityantalya`
 - AWS S3
 - Spark written iceberg tables

**Additional context**
Initially brought up in issue #887



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swarm join of iceberg tables fails with "Cannot find manifest file for data file error" or returns no data #1063

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Swarm join of iceberg tables fails with "Cannot find manifest file for data file error" or returns no data #1063

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions