-
Notifications
You must be signed in to change notification settings - Fork 14
Fix IN with Iceberg table #1168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix IN with Iceberg table #1168
Conversation
|
@codex review |
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Failed tests looks unrelated to this PR |
zvonand
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge once tests are finished
…_cluster_request Fix IN with Iceberg table
|
Verification test: https://github.com/Altinity/clickhouse-regression/blob/main/iceberg/tests/iceberg_engine/iceberg_iterator_race_condition.py This test reproduces a race condition when executing queries with IN subqueries. What was tested:
The test creates a partitioned Iceberg table with 100 rows, populates a local MergeTree table Verification:
|
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix IN with Iceberg table
Documentation entry for user-facing changes
What is going on.
Query like 'SELECT x FROM iceberg.table WHERE y IN (SELECT z FROM local.table)'
During query planning planner created IcebergIterator.
IcebergIterator takes filters (
filter_dag) to make partition pruning, min/max pruning, etc.And IcebergIterator in constructor creates a background thread to create list of objects in parallel with other work.
https://github.com/Altinity/ClickHouse/blob/antalya-25.8/src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergIterator.cpp#L280
But filters are not ready yet, this separate thread founds that need to complete build and tries to do it.
(
SingleThreadIcebergKeysIterator::next->ManifestFilesPruner::ManifestFilesPruner->KeyCondition::KeyCondition->KeyCondition::extractAtomFromTree->FutureSetFromSubquery::buildOrderedSetInplace)Meanwhile main thread continues to build plan, and in some moment also try to complete it.
(
QueryPlanOptimizations::addStepsToBuildSets->DelayedCreatingSetsStep::makePlansForSets->FutureSetFromSubquery::build)When both build runs in the same moment, we get race condition.
This PR adds mutex to prevent build from different threads.
More correct ways to fix it is to start IcebergIterator thread when plan is fully built, but planner blows my mind. so I made this simple workaround.
CI/CD Options
Exclude tests:
Regression jobs to run: