Skip to content

Conversation

@gaozhangmin
Copy link
Contributor

Reader ACK would be stuck:
image
This was caused by:
flushAsync is executed by scheduledTask, Any thrown exception or error reaching the executor causes the executor to halt。

The exception is as below:

image

@gaozhangmin
Copy link
Contributor Author

@gaozhangmin
Copy link
Contributor Author

/pulsarbot run-failure-checks

@gaozhangmin gaozhangmin force-pushed the fix-npe-lastCumulativeAck-messageId branch from a963849 to e65e64d Compare October 14, 2021 09:35
@gaozhangmin gaozhangmin force-pushed the fix-npe-lastCumulativeAck-messageId branch 3 times, most recently from aa42602 to b13d274 Compare October 21, 2021 06:52
@315157973
Copy link
Contributor

@eolivelli PTAL agagin

@gaozhangmin
Copy link
Contributor Author

@eolivelli PTAL

@gaozhangmin gaozhangmin force-pushed the fix-npe-lastCumulativeAck-messageId branch from b13d274 to 06d41bb Compare November 19, 2021 02:21
@gaozhangmin
Copy link
Contributor Author

/pulsarbot run-failure-checks

1 similar comment
@gaozhangmin
Copy link
Contributor Author

/pulsarbot run-failure-checks

@codelipenghui
Copy link
Contributor

@eolivelli Could you please help review this PR again?

@codelipenghui codelipenghui added this to the 2.10.0 milestone Nov 21, 2021
@github-actions
Copy link

@gaozhangmin:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

@michaeljmarshall
Copy link
Member

Regarding the halted executor, I believe @lhotari is adding support to ensure that these exceptions would not stop the scheduled task.

Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering how the client gets into a state where we have cumulativeAckFlushRequired set to true, but there is no "last ack". Is there a chance that we're updating the state incorrectly?

this.consumer.unAckedChunkedMessageIdSequenceMap.remove(lastCumulativeAck.messageId);
shouldFlush = true;
cumulativeAckFlushRequired = false;
final MessageIdImpl messageIdOfLastAck = lastCumulativeAck.messageId;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why we enter this code block when lastCumulativeAck.messageId is null? It seems like cumulativeAckFlushRequired should be false. I'm fine with adding a null check, but it seems like there could be another issue.

Copy link
Contributor Author

@gaozhangmin gaozhangmin Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous logic showed, stats were updated only when this.consumer.unAckedChunkedMessageIdSequenceMap.remove successfully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaozhangmin When reading the code, it seems that messageId should never be null. As @michaeljmarshall pointed out, there could be another issue hiding here.
It might be a thread safety issue which could cause other problems. @gaozhangmin Did you check how messageId could become null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhotari No, i didn't check the reason. much same as this issue: #11607

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #11607 fixes another case where lastCumulativeAck.messageId was null. @BewareMyPower What are your thoughts about the reason why the field is null in the first place? Could there be a thread safety issue that won't go away by adding null checks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaozhangmin do you have a chance to analyse the reason? My concern is that adding a null check might just silently ignore a potential thread safety issue which might lead to inconsistency behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lhotari As I've explained in my PR, I think there is a possible thread safety problem like

 if (lastCumulativeAck.messageId == null) { // 1. messageId is not null
     return false; 
 } 
 // 2. messageId was modified to null in another thread
 if (messageId.compareTo(lastCumulativeAck.messageId) <= 0) { // 3. messageId is null now

I think the root cause is the design of LastCumulativeAck class. It's not well encapsulated. Though its fields are all private, the class is an inner class so the outer class can access the members directly. And we can see the direct access in many places, which makes it hard to analyze whether all the accesses are thread safe.

Copy link
Member

@lhotari lhotari Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the root cause is the design of LastCumulativeAck class

@BewareMyPower I also think that LastCumulativeAck class is the source of problems. Using the Netty Recycler introduces the thread safety issue. Using an immutable class design for LastCumulativeAck and removing the use of Netty recycler would be something that I'd recommend for fixing the issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will implement immutable class design.
What's about this pr. should we close it? @lhotari

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaozhangmin @lhotari I picked up this issue today. After looking deeper into this issue, I found another bug and it would be better to synchronize the update operations of LastCumulativeAck. See #16072, PTAL when you have time.

@lhotari
Copy link
Member

lhotari commented Nov 22, 2021

Regarding the halted executor, I believe @lhotari is adding support to ensure that these exceptions would not stop the scheduled task.

@michaeljmarshall The PR #12853 has already been merged to master.

@gaozhangmin
Copy link
Contributor Author

/pulsarbot run-failure-checks

@Anonymitaet
Copy link
Member

@gaozhangmin is this a bug fix (no need to update docs)?

@lhotari lhotari added the doc-not-needed Your PR changes do not impact docs label Nov 24, 2021
@github-actions
Copy link

@gaozhangmin:Thanks for providing doc info!

@gaozhangmin
Copy link
Contributor Author

@gaozhangmin is this a bug fix (no need to update docs)?

Yes

@Anonymitaet
Copy link
Member

@gaozhangmin when submitting a PR, can you help provide a doc label (tick the box) in the PR template which contains info about doc? This helps others know more about the changes. Thanks

Documentation

@gaozhangmin
Copy link
Contributor Author

/pulsarbot run-failure-checks

@eolivelli
Copy link
Contributor

Yes I would close this pr

@gaozhangmin gaozhangmin closed this Dec 3, 2021
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this pull request Jun 15, 2022
### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- apache#10586
- apache#12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
apache#8996 to hold two object
references, but this modification is wrong.

Before apache#8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after apache#8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, two synchronized methods
were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values.
- `moveOwnershipTo`: This method moves the ownership to another
  `LastCumulativeAck` object, which will be responsible to recycle the
  `BitSetRecyclable` field after that.

With the methods above, each time `flushAsync` is called, move the
ownership of `lastCumulativeAck` field to another thread local field to
send the ACK command and recycle the `BitSetRecyclable` field.

- `lastCumulativeAck` updates the latest message ID and bit set, the
  update operations can be performed by multiple threads and
  `lastCumulativeAck` saves the latest value.
- `threadLocalLastCumulativeAckToFlush` only acts as a temporary cache
  to the latest value in `flushAsync`.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this pull request Jun 15, 2022
### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- apache#10586
- apache#12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
apache#8996 to hold two object
references, but this modification is wrong.

Before apache#8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after apache#8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, two synchronized methods
were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values.
- `moveOwnershipTo`: This method moves the ownership to another
  `LastCumulativeAck` object, which will be responsible to recycle the
  `BitSetRecyclable` field after that.

With the methods above, each time `flushAsync` is called, move the
ownership of `lastCumulativeAck` field to another thread local field to
send the ACK command and recycle the `BitSetRecyclable` field.

- `lastCumulativeAck` updates the latest message ID and bit set, the
  update operations can be performed by multiple threads and
  `lastCumulativeAck` saves the latest value.
- `threadLocalLastCumulativeAckToFlush` only acts as a temporary cache
  to the latest value in `flushAsync`.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this pull request Jun 16, 2022
### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- apache#10586
- apache#12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
apache#8996 to hold two object
references, but this modification is wrong.

Before apache#8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after apache#8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values.
- `moveOwnershipTo`: This method moves the ownership to another
  `LastCumulativeAck` object. After that, the `update` operation on this
  object won't recycle the `BitSetRecyclable` field.
- `restoreOwnershipIfEmpty`: Restore the ownership from another
  `LastCumulativeAck` object.

With the methods above, each time `flushAsync` is called, move the
ownership of `lastCumulativeAck` field to another thread local field to
send the ACK command. After that, restore the ownership to
`lastCumulativeAck` unless it has been updated in other threads.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this pull request Jun 21, 2022
### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- apache#10586
- apache#12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
apache#8996 to hold two object
references, but this modification is wrong.

Before apache#8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after apache#8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. Then mark it as no need to flush.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this pull request Jun 21, 2022
### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- apache#10586
- apache#12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
apache#8996 to hold two object
references, but this modification is wrong.

Before apache#8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after apache#8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.

Remove unused field

Don't reset in LastCumulativeAck#flush
BewareMyPower added a commit that referenced this pull request Jun 22, 2022
…6072)

### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- #10586
- #12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
#8996 to hold two object
references, but this modification is wrong.

Before #8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after #8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.
codelipenghui pushed a commit that referenced this pull request Jun 28, 2022
…6072)

### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- #10586
- #12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
#8996 to hold two object
references, but this modification is wrong.

Before #8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after #8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.

(cherry picked from commit 936d6fd)
mattisonchao pushed a commit that referenced this pull request Jul 2, 2022
…6072)

### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- #10586
- #12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
#8996 to hold two object
references, but this modification is wrong.

Before #8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after #8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.

(cherry picked from commit 936d6fd)
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Jul 4, 2022
…ache#16072)

### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- apache#10586
- apache#12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
apache#8996 to hold two object
references, but this modification is wrong.

Before apache#8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after apache#8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.

(cherry picked from commit 936d6fd)
(cherry picked from commit 5eefdf1)
BewareMyPower added a commit that referenced this pull request Jul 29, 2022
…6072)

### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- #10586
- #12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
#8996 to hold two object
references, but this modification is wrong.

Before #8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after #8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.

(cherry picked from commit 936d6fd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants