Skip to content

Conversation

@jerrypeng
Copy link
Contributor

Motivation

Add stats to be logged for schedule, rebalance, and check failure routines for debugging and performance details

@jerrypeng jerrypeng added this to the 2.7.0 milestone Jul 8, 2020
@jerrypeng jerrypeng requested review from sijie and srkukarni July 8, 2020 00:48
@jerrypeng jerrypeng self-assigned this Jul 8, 2020
}
log.info("Rebalance - Total number of new assignments computed: {}", rebalancedAssignments.size());

log.info("Rebalance summary - execution time: {} sec | stats: {}\n{}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also report this stats to prometheus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was going to work on that in the near future. Stats like scheduler execution time can be in prometheus. However, a breakdown of what how many instances moved grouped by worker is not suitable to be put into prometheus

}
if (triggerScheduler) {
log.info("Functions that need scheduling/rescheduling: {}", needSchedule);
log.info("Failure check - Total number of instances that need to be scheduled/rescheduled: {} | Number of unassigned instances that need to be scheduled: {} | Number of instances on dead workers that need to be reassigned {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better wording here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually liked the old message. What was the reason for the change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "Failure check" in the beginning so it will be easier to search for

public static class RebalanceInProgressException extends RuntimeException {
}

@Data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exported somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove

@jerrypeng jerrypeng requested review from merlimat and srkukarni July 8, 2020 06:31
@jerrypeng jerrypeng merged commit cc2c203 into apache:master Jul 8, 2020
huangdx0726 pushed a commit to huangdx0726/pulsar that referenced this pull request Aug 24, 2020
Co-authored-by: Jerry Peng <jerryp@splunk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants