Skip to content

Conversation

@poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Jun 27, 2022

Motivation

pending-ack-cursor will reload by MLPendingAckStoreProvider when created persistent subscription, it will check the cache firstly, and create a new one if the cache does not exist.

ManagedCursor cachedCursor = cursors.get(cursorName);
if (cachedCursor != null) {
if (log.isDebugEnabled()) {
log.debug("[{}] Cursor was already created {}", name, cachedCursor);
}
callback.openCursorComplete(cachedCursor, ctx);
return;
}

The managed ledger and cursor which already exist may be like this:

cursor.markDeletePosition: 60:1
cursor.readPosition: 60:6
managedLedger.getLastConfirmedEntry: 60:5

So MLPendingAckStore will initialize like this:

currentLoadPosition: 60:1
lastConfirmedEntry: 60:5

At this point, we expected four logs to be read, but none were. And then it goes into an infinite loop.

while (lastConfirmedEntry.compareTo(currentLoadPosition) > 0 && fillEntryQueueCallback.fillQueue()) {
Entry entry = entryQueue.poll();
if (entry != null) {
ByteBuf buffer = entry.getDataBuffer();
currentLoadPosition = PositionImpl.get(entry.getLedgerId(), entry.getEntryId());
PendingAckMetadataEntry pendingAckMetadataEntry = new PendingAckMetadataEntry();
pendingAckMetadataEntry.parseFrom(buffer, buffer.readableBytes());
currentIndexLag.incrementAndGet();
handleMetadataEntry(new PositionImpl(entry.getLedgerId(), entry.getEntryId()),
pendingAckMetadataEntry);
pendingAckReplyCallBack.handleMetadataEntry(pendingAckMetadataEntry);
entry.release();
clearUselessLogData();
} else {
try {
Thread.sleep(1);
} catch (InterruptedException e) {
if (Thread.interrupted()) {
log.error("[{}]Transaction pending "
+ "replay thread interrupt!", managedLedger.getName(), e);
}
}
}
}

And we should not lose these four logs while executing A, because they may be in a transaction that has not yet handle, so we should reset the read position of the pending-ack-log cursor.

Modifications

Consistently execute 'cursor.rewind' after pending-ack-log cursor initialize.

Documentation

  • doc-required

  • doc-not-needed

  • doc

  • doc-complete

@poorbarcode poorbarcode force-pushed the fix/pengding_ack_looping branch from 17887d8 to 3a0a7f6 Compare June 27, 2022 10:32
@poorbarcode
Copy link
Contributor Author

@congbobo184 @liangyepianzhou Could you take a look, thanks. ^_^

@github-actions
Copy link

@poorbarcode Please provide a correct documentation label for your PR.
Instructions see Pulsar Documentation Label Guide.

@poorbarcode
Copy link
Contributor Author

/pulsarbot run-failure-checks

@github-actions
Copy link

@poorbarcode Please provide a correct documentation label for your PR.
Instructions see Pulsar Documentation Label Guide.

@poorbarcode
Copy link
Contributor Author

/pulsarbot run-failure-checks

@github-actions
Copy link

@poorbarcode Please provide a correct documentation label for your PR.
Instructions see Pulsar Documentation Label Guide.

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing labels Jun 27, 2022
@liangyepianzhou
Copy link
Contributor

The cursor of the pending ack log is only used when it is created and recovered, so it will not go to the cache.
Are there any scenarios I haven't considered?

Copy link
Contributor

@liangyepianzhou liangyepianzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to delete the cursor after recovery is complete?

public void openCursorComplete(ManagedCursor cursor, Object ctx) {
// Why "rewind" ?
// see: https://github.com/apache/pulsar/pull/16240
cursor.rewind();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to delete the cursor after recovery is complete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to delete the cursor after recovery is complete?

No, that won't solve this problem. #16247 can solve this problem, I am thinking of closing this PR

@poorbarcode
Copy link
Contributor Author

The cursor of the pending ack log is only used when it is created and recovered, so it will not go to the cache. Are there any scenarios I haven't considered?

Yes, It is #16247

@poorbarcode
Copy link
Contributor Author

#16247 can solve this problem, close this PR

@poorbarcode poorbarcode deleted the fix/pengding_ack_looping branch June 30, 2022 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/transaction doc-not-needed Your PR changes do not impact docs release/2.9.4 release/2.10.2 type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants