-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Expose managed ledger bookie client metric to prometheus #6814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/pulsarbot run-failure-checks |
codelipenghui
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good. Could you please help add the description of the new metric into https://pulsar.apache.org/docs/en/reference-metrics/#broker? This is very helpful for users to understand these new metrics.
| // bookie_journal_JOURNAL_ADD_ENTRY{success="false",quantile="0.5",} NaN | ||
| // bookie_journal_JOURNAL_ADD_ENTRY{success="false",quantile="0.75",} NaN | ||
| // bookie_journal_JOURNAL_ADD_ENTRY{success="false",quantile="0.95",} NaN | ||
| // bookie_journal_JOURNAL_ADD_ENTRY{success="false",quantile="0.99",} NaN | ||
| // bookie_journal_JOURNAL_ADD_ENTRY{success="false",quantile="0.999",} NaN | ||
| // bookie_journal_JOURNAL_ADD_ENTRY{success="false",quantile="0.9999",} NaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get these metrics at the bookie client side? If can't, I think it's better to use some bookie client metrics in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked the bookie client metric code, some class constructor and functions are package private and can't be accessed in pulsar.
| private ManagedLedgerFactoryImpl(BookkeeperFactoryForCustomEnsemblePlacementPolicy bookKeeperGroupFactory, boolean isBookkeeperManaged, ZooKeeper zooKeeper, | ||
| ManagedLedgerFactoryConfig config) throws Exception { | ||
| Configuration configuration = new ClientConfiguration(); | ||
| configuration.addProperty("prometheusStatsLatencyRolloverSeconds", config.getPrometheusStatsLatencyRolloverSeconds()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| configuration.addProperty("prometheusStatsLatencyRolloverSeconds", config.getPrometheusStatsLatencyRolloverSeconds()); | |
| configuration.addProperty(PrometheusMetricsProvider.PROMETHEUS_STATS_LATENCY_ROLLOVER_SECONDS, config.getPrometheusStatsLatencyRolloverSeconds()); |
| import java.io.IOException; | ||
| import java.io.Writer; | ||
| import java.lang.reflect.Field; | ||
| import java.util.concurrent.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid use import xxx.* and please check all.
c78374d to
6d7075b
Compare
thanks for your feedback. I have update the document. |
|
/pulsarbot run-failure-checks |
|
/pulsarbot run-failure-checks |
sijie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, why do you implement a new StatsProvider? BookKeeper already has the prometheus data provider. Why not re-use it?
The reasons of implementing a new prometheus statsProvider as follows:
whether have other new ways to reuse bookkeeper's prometheus statsProvider? |
|
/pulsarbot run-failure-checks |
…nfiguration for stats expose
|
/pulsarbot run-failure-checks |
1 similar comment
|
/pulsarbot run-failure-checks |
|
@hangc0276 sorry for late response. I generally don't think copying the files and maintain another implementation in Pulsar is a good idea.
the prometheus stats provider also work well with bookie HTTP server as well. There is a flag to disable the prometheus HTTP server in the stats provider by setting
Why do you need to access those components? Can't you use the bookkeeper stats library? You can always use |
Sorry, i am not found the flag. I will fix it soon. Thanks for your feedback. |
|
@hangc0276 This would be very valuable to help us debug some throughput & performance issues we're seeing when on EBS and would like to pinpoint our bottlenecks. Appreciate the work, do you need anything else to continue forward with this? |
Thanks for your reply. With the bookie client metric, we could easily find the bottlenecks of pulsar. I need to use bookkeeper prometheus lib instead of the copy one, and fix the prometheus test case. Because the pulsar prometheus test case is conflict with bookkeeper prometheus output format. I will fix it soon. |
|
@sijie I try to use bookkeeper prometheus provider lib the export the bookie client metric, but failed. The exception is The reason is bookkeeper lib However the pulsar broker prometheus metric also export those metrics to prometheus, some of the metrics registered twice, which will lead to metric export failed. Could you give me some ideas ? |
|
move to 2.7.0 first. |
|
/pulsarbot run-failure-checks |
sijie
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hangc0276 we can add this change first. Can you add a configuration setting either in bookkeeper side or pulsar side to allow excluding jvm settings? So you can avoid this situation and eventually use the bookkeeper metrics library.
@sijie Ok, I think it's better to add a configuration setting in bookkeeper side. As bookkeeper client metric, it should not import the external jvm and other metrics. I will fix it on bookkeeper side, but it depend on the bookkeeper release. |
|
@hangc0276 sounds good to me. |
### Motivation Pulsar use bookkeeper as distributed log storage, and init a bookie client to read/write data from/to bookkeeper. However the pulsar bookie client use default ` NullStatsLogger.INSTANCE` to expose runtime metric, which doesn't expose to prometheus or other state storage. When tuning pulsar bookie client performance, we doesn't have any bookie metric to measure where is the bottleneck. ### Changes I implement a prometheus state provider, and use it to trace bookie client runtime metric, and expose it to prometheus.
### Motivation Pulsar use bookkeeper as distributed log storage, and init a bookie client to read/write data from/to bookkeeper. However the pulsar bookie client use default ` NullStatsLogger.INSTANCE` to expose runtime metric, which doesn't expose to prometheus or other state storage. When tuning pulsar bookie client performance, we doesn't have any bookie metric to measure where is the bottleneck. ### Changes I implement a prometheus state provider, and use it to trace bookie client runtime metric, and expose it to prometheus.
Motivation
Pulsar use bookkeeper as distributed log storage, and init a bookie client to read/write data from/to bookkeeper. However the pulsar bookie client use default
NullStatsLogger.INSTANCEto expose runtime metric, which doesn't expose to prometheus or other state storage. When tuning pulsar bookie client performance, we doesn't have any bookie metric to measure where is the bottleneck.Changes
I implement a prometheus state provider, and use it to trace bookie client runtime metric, and expose it to prometheus.
Please help take a look, if it's ok, i will add test case and update the metric document.