Skip to content

Conversation

@poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Aug 10, 2023

Motivation

Background: when calling pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>, Pulsar will read the first entry which is not acknowledged, and respond with the entry write time. The flow is like this:

  • get the mark deleted position of the subscription
  • if no backlog, response -1
  • else read the next position of the mark deleted position, and respond with the entry write time.

Error logs

org.apache.bookkeeper.mledger.ManagedLedgerException: Error while reading ledger
2023-08-03T13:44:25,120+0000 [pulsar-web-36-13] ERROR org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - Error read entry for position 1739429:37881	
2023-08-03T13:44:25,120+0000 [pulsar-web-36-13] ERROR org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://public/default/123] Failed to get earliest message publish time in backlog	
2023-08-03T13:44:25,120+0000 [pulsar-web-36-13] ERROR org.apache.bookkeeper.client.LedgerHandle - ReadAsync exception on ledgerId:1739429 firstEntry:37881 lastEntry:37881 lastAddConfirmed:37880	
org.apache.bookkeeper.mledger.ManagedLedgerException: Error while reading ledger	
java.util.concurrent.ExecutionException: org.apache.bookkeeper.mledger.ManagedLedgerException: Error while reading ledger	
2023-08-03T13:44:25,120+0000 [pulsar-web-36-13] ERROR org.apache.pulsar.broker.admin.impl.PersistentTopicsBase - [pulsarinstance-aws-prod1@o-7503y.auth.streamnative.cloud] Failed to get stats for persistent://public/default/123	
2023-08-03T13:44:25,121+0000 [pulsar-web-36-13] INFO  org.eclipse.jetty.server.RequestLog - 10.2.121.5 - - [03/Aug/2023:13:44:25 +0000] "GET /admin/v2/persistent/public/default/123/stats?getPreciseBacklog=true&authoritative=false&subscriptionBacklogSize=false&getEarliestTimeInBacklog=true HTTP/1.1" 500 39 "-" "Pulsar-Java-v2.10.3.2" 3	

Issue: if the command pulsar-admin topics stats --get-earliest-time-in-backlog <topic name> and consumer.acknowledge are executed at the same time, the step 2 in above flow will get a position which is larger than the last confirmed position, lead a read entry error.

time pulsar-admin topics stats --get-earliest-time-in-backlog <topic name> consumer.acknowledge
1 mark deleted position is 3:1 and LAC is 3:2 now
2 the check whether has backlog is passed
3 acknowledged 3:2, mark deleted position is 3:2 now
4 calculate next position: 3:3
5 Read 3:3 and get an error: read entry failed

Note: the test in PR is not intended to reproduce the issue.

Modifications

Respond -1 if the next position of the mark deleted position is larger than the LAC

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode self-assigned this Aug 10, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 10, 2023
@poorbarcode poorbarcode added release/3.0.2 release/2.11.3 release/2.10.6 type/bug The PR fixed a bug or issue reported a bug area/admin and removed doc-not-needed Your PR changes do not impact docs labels Aug 10, 2023
@poorbarcode poorbarcode added this to the 3.2.0 milestone Aug 10, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 10, 2023
@poorbarcode poorbarcode force-pushed the fix/topic_stats_ealiest_msg branch from f80ff40 to a81ec2f Compare August 14, 2023 08:04
@poorbarcode
Copy link
Contributor Author

rebase master

Copy link
Contributor

@lifepuzzlefun lifepuzzlefun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Catch !

@Technoboy- Technoboy- closed this Aug 20, 2023
@Technoboy- Technoboy- reopened this Aug 20, 2023
@codecov-commenter
Copy link

codecov-commenter commented Aug 20, 2023

Codecov Report

❌ Patch coverage is 71.42857% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.06%. Comparing base (64cd753) to head (e9e6a69).
⚠️ Report is 1736 commits behind head on master.

Files with missing lines Patch % Lines
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 0.00% 1 Missing and 1 partial ⚠️
...ker/service/persistent/PersistentSubscription.java 83.33% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #20971       +/-   ##
=============================================
+ Coverage     37.05%   73.06%   +36.00%     
- Complexity    12241    32361    +20120     
=============================================
  Files          1698     1875      +177     
  Lines        129816   140982    +11166     
  Branches      14157    15730     +1573     
=============================================
+ Hits          48109   103009    +54900     
+ Misses        75377    29821    -45556     
- Partials       6330     8152     +1822     
Flag Coverage Δ
inttests 24.55% <0.00%> (+0.30%) ⬆️
systests 25.56% <0.00%> (+0.29%) ⬆️
unittests 72.27% <71.42%> (+40.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 81.34% <0.00%> (+30.65%) ⬆️
...ker/service/persistent/PersistentSubscription.java 73.88% <83.33%> (+23.73%) ⬆️

... and 1436 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@poorbarcode poorbarcode merged commit 7c96a36 into apache:master Aug 20, 2023
Technoboy- pushed a commit that referenced this pull request Aug 21, 2023
…currently (#20971)

### Motivation

**Background**: when calling `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>`, Pulsar will read the first entry which is not acknowledged, and respond with the entry write time. The flow is like this:
- get the mark deleted position of the subscription
- if no backlog, response `-1`
- else read the next position of the mark deleted position, and respond with the entry write time.

**Issue**: if the command `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>` and `consumer.acknowledge` are executed at the same time, the step 2 in above flow will get a position which is larger than the last confirmed position, lead a read entry error.

| time | `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>` | `consumer.acknowledge` |
| --- | --- | --- |
| 1 | mark deleted position is `3:1` and LAC is `3:2` now |
| 2 | the check `whether has backlog` is passed |
| 3 | | acknowledged `3:2`, mark deleted position is `3:2` now |
| 4 | calculate next position: `3:3` |
| 5 | Read `3:3` and get an error: `read entry failed` |

Note: the test in PR is not intended to reproduce the issue.

### Modifications

Respond `-1` if the next position of the mark deleted position is larger than the LAC
Technoboy- pushed a commit that referenced this pull request Aug 21, 2023
…currently (#20971)

### Motivation

**Background**: when calling `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>`, Pulsar will read the first entry which is not acknowledged, and respond with the entry write time. The flow is like this:
- get the mark deleted position of the subscription
- if no backlog, response `-1`
- else read the next position of the mark deleted position, and respond with the entry write time.

**Issue**: if the command `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>` and `consumer.acknowledge` are executed at the same time, the step 2 in above flow will get a position which is larger than the last confirmed position, lead a read entry error.

| time | `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>` | `consumer.acknowledge` |
| --- | --- | --- |
| 1 | mark deleted position is `3:1` and LAC is `3:2` now |
| 2 | the check `whether has backlog` is passed |
| 3 | | acknowledged `3:2`, mark deleted position is `3:2` now |
| 4 | calculate next position: `3:3` |
| 5 | Read `3:3` and get an error: `read entry failed` |

Note: the test in PR is not intended to reproduce the issue.

### Modifications

Respond `-1` if the next position of the mark deleted position is larger than the LAC
shibd pushed a commit to shibd/pulsar that referenced this pull request Oct 22, 2023
…currently (apache#20971)

**Background**: when calling `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>`, Pulsar will read the first entry which is not acknowledged, and respond with the entry write time. The flow is like this:
- get the mark deleted position of the subscription
- if no backlog, response `-1`
- else read the next position of the mark deleted position, and respond with the entry write time.

**Issue**: if the command `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>` and `consumer.acknowledge` are executed at the same time, the step 2 in above flow will get a position which is larger than the last confirmed position, lead a read entry error.

| time | `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>` | `consumer.acknowledge` |
| --- | --- | --- |
| 1 | mark deleted position is `3:1` and LAC is `3:2` now |
| 2 | the check `whether has backlog` is passed |
| 3 | | acknowledged `3:2`, mark deleted position is `3:2` now |
| 4 | calculate next position: `3:3` |
| 5 | Read `3:3` and get an error: `read entry failed` |

Note: the test in PR is not intended to reproduce the issue.

Respond `-1` if the next position of the mark deleted position is larger than the LAC

(cherry picked from commit 7c96a36)
shibd pushed a commit to shibd/pulsar that referenced this pull request Oct 24, 2023
…currently (apache#20971)

**Background**: when calling `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>`, Pulsar will read the first entry which is not acknowledged, and respond with the entry write time. The flow is like this:
- get the mark deleted position of the subscription
- if no backlog, response `-1`
- else read the next position of the mark deleted position, and respond with the entry write time.

**Issue**: if the command `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>` and `consumer.acknowledge` are executed at the same time, the step 2 in above flow will get a position which is larger than the last confirmed position, lead a read entry error.

| time | `pulsar-admin topics stats --get-earliest-time-in-backlog <topic name>` | `consumer.acknowledge` |
| --- | --- | --- |
| 1 | mark deleted position is `3:1` and LAC is `3:2` now |
| 2 | the check `whether has backlog` is passed |
| 3 | | acknowledged `3:2`, mark deleted position is `3:2` now |
| 4 | calculate next position: `3:3` |
| 5 | Read `3:3` and get an error: `read entry failed` |

Note: the test in PR is not intended to reproduce the issue.

Respond `-1` if the next position of the mark deleted position is larger than the LAC

(cherry picked from commit 7c96a36)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants