Skip to content

Conversation

@ianton-ru
Copy link

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Profile events for task distribution in ObjectStorageCluster requests

Documentation entry for user-facing changes

New profile events:

  • ObjectStorageClusterSentToMatchedReplica - how many tasks are sent on replicas, matched by Rendezvous hashing algorithm (calculated on initiator)
  • ObjectStorageClusterSentToNonMatchedReplica - how many tasks are sent on random replicas, which processed all matched tasks (calculated on initiator)
  • ObjectStorageClusterProcessedTasks - how many tasks were processed (calculated on replicas)
  • ObjectStorageClusterWaitingMicroseconds - how many time spent on waiting for tasks until lock_object_storage_task_distribution_ms timeout (calculated on replicas)

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@github-actions
Copy link

github-actions bot commented Nov 26, 2025

Workflow [PR], commit [44b9bed]

zvonand
zvonand previously approved these changes Nov 26, 2025
M(ParquetMetaDataCacheMisses, "Number of times the read from filesystem cache miss the cache.", ValueType::Number) \
\
M(ObjectStorageClusterSentToMatchedReplica, "Number of tasks in ObjectStorageCluster request sent on matched replica.", ValueType::Number) \
M(ObjectStorageClusterSentToNonMatchedReplica, "Number of tasks in ObjectStorageCluster request sent on non-matched replica.", ValueType::Number) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sent to on ...

/// Now this sleep is on executor node in worker thread
/// Does not block query initiator
ProfileEvents::increment(ProfileEvents::ObjectStorageClusterWaitingMicroseconds, retry_after_us.value());
sleepForMicroseconds(std::min(Poco::Timestamp::TimeDiff(100000ul), retry_after_us.value()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we ignore 100000ul for profile event?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups

);

ProfileEvents::increment(ProfileEvents::ObjectStorageClusterSentToNonMatchedReplica);
return next_file;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that replica is not matched here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here is only when no matched files anymore.

ilejn
ilejn previously approved these changes Nov 26, 2025
Copy link
Collaborator

@ilejn ilejn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Just minor questions.

@ilejn
Copy link
Collaborator

ilejn commented Nov 26, 2025

Test failures worth analyzing.
Not necessarily related to this particular PR though.

@ianton-ru ianton-ru dismissed stale reviews from ilejn and zvonand via 1f8c298 November 26, 2025 19:00
@ianton-ru
Copy link
Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +6 to +10
namespace ProfileEvents
{
extern const Event ObjectStorageClusterSentToMatchedReplica;
extern const Event ObjectStorageClusterSentToNonMatchedReplica;
};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Add ProfileEvents header for new counters

This file now declares ProfileEvents::ObjectStorageClusterSentToMatchedReplica/...NonMatchedReplica and calls ProfileEvents::increment, but it never includes <Common/ProfileEvents.h>, so Event and increment are undefined in this translation unit. As written the code will not compile after this commit; the missing header (or forward declaration of increment plus the Event typedef) needs to be added.

Useful? React with 👍 / 👎.

@zvonand zvonand merged commit 1620e0c into antalya-25.8 Dec 8, 2025
134 of 138 checks passed
zvonand added a commit that referenced this pull request Dec 9, 2025
…orage_cluster_profile_events

Profile events for task distribution in ObjectStorageCluster requests
@alsugiliazova alsugiliazova added the verified Verified by QA label Dec 15, 2025
@alsugiliazova
Copy link
Member

Verification test https://github.com/Altinity/clickhouse-regression/blob/main/swarms/tests/object_storage_cluster_profile_events.py

The test creates a cluster with multiple nodes, sets up an overload scenario with one slow/overloaded node,
executes queries using with object_storage_cluster and lock_object_storage_task_distribution_ms
settings, and then queries system.query_log to verify that metrics are correctly incremented on both initiator
and replica nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants