-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Log scheduler stats for Pulsar Functions #7474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
| log.info("Rebalance - Total number of new assignments computed: {}", rebalancedAssignments.size()); | ||
|
|
||
| log.info("Rebalance summary - execution time: {} sec | stats: {}\n{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also report this stats to prometheus?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was going to work on that in the near future. Stats like scheduler execution time can be in prometheus. However, a breakdown of what how many instances moved grouped by worker is not suitable to be put into prometheus
| } | ||
| if (triggerScheduler) { | ||
| log.info("Functions that need scheduling/rescheduling: {}", needSchedule); | ||
| log.info("Failure check - Total number of instances that need to be scheduled/rescheduled: {} | Number of unassigned instances that need to be scheduled: {} | Number of instances on dead workers that need to be reassigned {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a better wording here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually liked the old message. What was the reason for the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added "Failure check" in the beginning so it will be easier to search for
| public static class RebalanceInProgressException extends RuntimeException { | ||
| } | ||
|
|
||
| @Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this exported somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove
Co-authored-by: Jerry Peng <jerryp@splunk.com>
Motivation
Add stats to be logged for schedule, rebalance, and check failure routines for debugging and performance details