Skip to content

Conversation

@tjiuming
Copy link
Contributor

Motivation
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

Modifications
Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API: (no)
  • The schema: (no)
  • The default values of configurations: (no)
  • The wire protocol: (no)
  • The rest endpoints: (no)
  • The admin cli options: (no)
  • Anything that affects deployment: (don't know)

Documentation

Check the box below or label this PR directly (if you have committer privilege).

Need to update docs?

  • doc-required

    (If you need help on updating docs, create a doc issue)

  • no-need-doc

    (Please explain why)

  • doc

    (If this PR contains doc changes)

@github-actions github-actions bot added the doc-required Your PR changes impact docs and you will update later. label Jan 19, 2022
@tjiuming
Copy link
Contributor Author

TRB, tests will be complete later.

@michaeljmarshall
Copy link
Member

Motivation
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

Modifications
Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

@tjiuming - the content of this PR description is copied from #11555. Is the content of the PR also copied from that PR? Several committers had requested changes on that PR, so it'd be helpful to understand how this PR compares to #11555.

I see that @codelipenghui just closed that PR a couple days ago. @frankxieke, were you no longer planning on finishing #11555?

@tjiuming
Copy link
Contributor Author

tjiuming commented Jan 20, 2022

Motivation
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.
Modifications
Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

@tjiuming - the content of this PR description is copied from #11555. Is the content of the PR also copied from that PR? Several committers had requested changes on that PR, so it'd be helpful to understand how this PR compares to #11555.

I see that @codelipenghui just closed that PR a couple days ago. @frankxieke, were you no longer planning on finishing #11555?

#11555 is not finished, @frankxieke leaved streamnative, so I take over this issue and fix existing problems. @michaeljmarshall

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the patch LGTM

but I left one comment about topics unloading, please take a look

@tjiuming tjiuming requested a review from eolivelli April 13, 2022 07:45
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@codelipenghui codelipenghui merged commit 732049f into apache:master Apr 15, 2022
this.brokerClientSharedTimer =
new HashedWheelTimer(new DefaultThreadFactory("broker-client-shared-timer"), 1, TimeUnit.MILLISECONDS);

int interval = config.getManagedLedgerStatsPeriodSeconds();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if interval is set to <=0 ? Does that disable stats?


static LedgerOffloaderStats create(boolean exposeManagedLedgerStats, boolean exposeTopicLevelMetrics,
ScheduledExecutorService scheduler, int interval) {
if (!exposeManagedLedgerStats) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also contain the condition || interval <= 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, the value of managedLedgerStatsPeriodSeconds always > 0(default value is 60000), but I can create another PR to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, the value of managedLedgerStatsPeriodSeconds always > 0(default value is 60000), but I can create another PR to fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to address in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this was already merged, so please address in a new PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjiuming did you follow up on this comment ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eolivelli interval actually is managedLedgerStatsPeriodSeconds in ServiceConfiguration, and managedLedgerStatsPeriodSeconds is annotated by @FieldContext(minValue = 1). So, interval always >= 1

@Anonymitaet Anonymitaet added doc-complete Your PR changes impact docs and the related docs have been already added. and removed doc-required Your PR changes impact docs and you will update later. labels Apr 18, 2022
aparajita89 pushed a commit to aparajita89/pulsar that referenced this pull request Apr 18, 2022
### Motivation
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

### Modifications
Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this pull request Apr 20, 2022
### Motivation
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

### Modifications
Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.
@tjiuming tjiuming deleted the offloader-metrics branch May 10, 2022 07:01
dlg99 pushed a commit to dlg99/pulsar that referenced this pull request May 11, 2022
### Motivation
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

### Modifications
Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

(cherry picked from commit 732049f)
dlg99 pushed a commit to dlg99/pulsar that referenced this pull request May 11, 2022
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

(cherry picked from commit 732049f)
dlg99 pushed a commit to dlg99/pulsar that referenced this pull request May 11, 2022
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

(cherry picked from commit 732049f)
dlg99 pushed a commit to dlg99/pulsar that referenced this pull request May 12, 2022
### Motivation
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

### Modifications
Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

(cherry picked from commit 732049f)
dlg99 pushed a commit to dlg99/pulsar that referenced this pull request May 12, 2022
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

(cherry picked from commit 732049f)
dlg99 pushed a commit to dlg99/pulsar that referenced this pull request May 12, 2022
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

(cherry picked from commit 732049f)
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request May 12, 2022
### Motivation
Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

### Modifications
Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

(cherry picked from commit 732049f)

Co-authored-by: Tao Jiuming <95597048+tjiuming@users.noreply.github.com>
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request May 12, 2022
* [ManagedLedger] Make 'StatsPeroidSeconds' configurable (apache#11584)

Make `StatsPeroidSeconds` configurable.

- Move ‘StatsPeriodSeconds’ from ManagedLedgerFactoryImpl to ManagedLedgerFactoryConfig.
- Add config `managedLedgerStatsPeriodSeconds`.

* Offloader metrics (apache#13833)

Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

(cherry picked from commit 732049f)

Co-authored-by: GuoJiwei <technoboy@apache.org>
Co-authored-by: Tao Jiuming <95597048+tjiuming@users.noreply.github.com>
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request May 12, 2022
* [ManagedLedger] Make 'StatsPeroidSeconds' configurable (apache#11584)

Make `StatsPeroidSeconds` configurable.

- Move ‘StatsPeriodSeconds’ from ManagedLedgerFactoryImpl to ManagedLedgerFactoryConfig.
- Add config `managedLedgerStatsPeriodSeconds`.

(cherry picked from commit ddeae65)

* Offloader metrics (apache#13833)

Currently, there is no offload metrics for tiered storage, so it is very hard for us to debug the performance issues. For example , we can not find why offload is slow or why read offload is slow. For above reasons. we need to add some offload metrics for monitoring.

Add metrics during offload procedure and read offload data procedure. Including offloadTime, offloadError, offloadRate, readLedgerLatency, writeStoreLatency, writeStoreError, readOffloadIndexLatency, readOffloadDataLatency, readOffloadRate, readOffloadError.

(cherry picked from commit 732049f)

Co-authored-by: GuoJiwei <technoboy@apache.org>
Co-authored-by: Tao Jiuming <95597048+tjiuming@users.noreply.github.com>
poorbarcode pushed a commit that referenced this pull request Apr 10, 2023
After #13833, we change the signature of LedgerOffloaderFactory#create. But some users may use the old version LedgerOffloaderFactory in the offloader nar, when pulsar load the offloader nar, it will use the LedgerOffloaderFactory in the offloader nar, the LedgerOffloaderFactory#create has no param offloaderStats.

Modifications: Make LedgerOffloaderFactory can load the old nar.
Technoboy- pushed a commit that referenced this pull request May 9, 2023
After #13833, we change the signature of LedgerOffloaderFactory#create. But some users may use the old version LedgerOffloaderFactory in the offloader nar, when pulsar load the offloader nar, it will use the LedgerOffloaderFactory in the offloader nar, the LedgerOffloaderFactory#create has no param offloaderStats.

Modifications: Make LedgerOffloaderFactory can load the old nar.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/metrics area/tieredstorage doc-complete Your PR changes impact docs and the related docs have been already added.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants