Skip to content

Conversation

@BewareMyPower
Copy link
Contributor

@BewareMyPower BewareMyPower commented Jun 15, 2022

Motivation

There were several issues caused by the thread safe issue of
LastCumulativeAck, see:

The root cause is that LastCumulativeAck could be accessed by
different threads, especially in flushAsync method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current LastCumulativeAck class was added in
#8996 to hold two object
references, but this modification is wrong.

Before #8996, there are two CAS operations in doCumulativeAck method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after #8996, only CAS operation was performed but it's compared
with a LastCumulativeAck object, not the two fields (messageId and
bitSetRecyclable).

There is another issue that it uses a flag cumulativeAckFlushRequired
to mark if lastCumulativeAck should flush. However, if flushAsync
was called concurrently, both would send ACK commands to broker.

Modifications

To solve the thread safety issue, this PR move the LastCumulativeAck
out of the PersistentAcknowledgmentsGroupingTracker to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:

  • update: Guarantee the safe write operations. It also recycles the
    BitSetRecyclable object before assigning new values and indicates
    itself can be flushed.
  • flush: If it can be flushed, return a thread local
    LastCumulativeAck instance that contains the message ID and the bit
    set. The bit set is deep copied to avoid the original reference being
    recycled in another update call.

In addition, since the messageId field is volatile, the getMessageId
method can always retrieve the latest reference.

LastCumulativeAckTest is added to verify the sematics above.

Based on the new design, we can only maintain a LastCumulativeAck
field in PersistentAcknowledgmentsGroupingTracker and call the related
methods in doCumulativeAck and flushAsync. It also fixes the problem
that two concurrent flushAsync calls might send the same ACK command
twice.

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • doc-required
    (Your PR needs to update docs and you will update later)

  • doc-not-needed
    (Please explain why)

  • doc
    (Your PR contains doc changes)

  • doc-complete
    (Docs have been already added)

@BewareMyPower BewareMyPower added type/bug The PR fixed a bug or issue reported a bug release/2.8.4 release/2.10.2 release/2.9.4 labels Jun 15, 2022
@BewareMyPower BewareMyPower added this to the 2.11.0 milestone Jun 15, 2022
@BewareMyPower BewareMyPower self-assigned this Jun 15, 2022
@Anonymitaet Anonymitaet added the doc-not-needed Your PR changes do not impact docs label Jun 16, 2022
@BewareMyPower BewareMyPower marked this pull request as draft June 16, 2022 02:55
@BewareMyPower BewareMyPower marked this pull request as ready for review June 16, 2022 03:19
@BewareMyPower BewareMyPower force-pushed the bewaremypower/last-cumulative-ack branch from 5c48830 to ba30863 Compare June 16, 2022 10:04
@BewareMyPower
Copy link
Contributor Author

Now all tests passed, PTAL, @lhotari @gaozhangmin @congbobo184 @mattisonchao @Technoboy- @codelipenghui

Copy link
Member

@Demogorgon314 Demogorgon314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just left two minor comments.

@BewareMyPower BewareMyPower force-pushed the bewaremypower/last-cumulative-ack branch from c2b911a to 38ddc7d Compare June 21, 2022 12:04
@BewareMyPower
Copy link
Contributor Author

I've updated this PR with a refactored change and the PR description has also been updated, PTAL again @Demogorgon314 @congbobo184

### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- apache#10586
- apache#12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
apache#8996 to hold two object
references, but this modification is wrong.

Before apache#8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after apache#8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.

Remove unused field

Don't reset in LastCumulativeAck#flush
@BewareMyPower BewareMyPower force-pushed the bewaremypower/last-cumulative-ack branch from ac95266 to 89cf62a Compare June 21, 2022 15:35
Copy link
Contributor

@congbobo184 congbobo184 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@BewareMyPower BewareMyPower merged commit 936d6fd into apache:master Jun 22, 2022
@BewareMyPower BewareMyPower deleted the bewaremypower/last-cumulative-ack branch June 22, 2022 15:34
codelipenghui pushed a commit that referenced this pull request Jun 28, 2022
…6072)

### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- #10586
- #12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
#8996 to hold two object
references, but this modification is wrong.

Before #8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after #8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.

(cherry picked from commit 936d6fd)
mattisonchao pushed a commit that referenced this pull request Jul 2, 2022
…6072)

### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- #10586
- #12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
#8996 to hold two object
references, but this modification is wrong.

Before #8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after #8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.

(cherry picked from commit 936d6fd)
@mattisonchao mattisonchao added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Jul 2, 2022
nicoloboschi pushed a commit to datastax/pulsar that referenced this pull request Jul 4, 2022
…ache#16072)

### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- apache#10586
- apache#12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
apache#8996 to hold two object
references, but this modification is wrong.

Before apache#8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after apache#8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.

(cherry picked from commit 936d6fd)
(cherry picked from commit 5eefdf1)
BewareMyPower added a commit that referenced this pull request Jul 29, 2022
…6072)

### Motivation

There were several issues caused by the thread safe issue of
`LastCumulativeAck`, see:
- #10586
- #12343

The root cause is that `LastCumulativeAck` could be accessed by
different threads, especially in `flushAsync` method. But the fields are
accessed directly and no thread safety can be guaranteed.

In addition, the current `LastCumulativeAck` class  was added in
#8996 to hold two object
references, but this modification is wrong.

Before #8996, there are two CAS operations in `doCumulativeAck` method
in case it's called concurretly. Though the composite CAS operation is
not atomic.

However, after #8996, only CAS operation was performed but it's compared
with a `LastCumulativeAck` object, not the two fields (`messageId` and
`bitSetRecyclable`).

There is another issue that it uses a flag `cumulativeAckFlushRequired`
to mark if `lastCumulativeAck` should flush. However, if `flushAsync`
was called concurrently, both would send ACK commands to broker.

### Modifications

To solve the thread safety issue, this PR move the `LastCumulativeAck`
out of the `PersistentAcknowledgmentsGroupingTracker` to disable
directly access to the internal fields. Then, the following synchronized
methods were added to guarantee the thread safety:
- `update`: Guarantee the safe write operations. It also recycles the
  `BitSetRecyclable` object before assigning new values and indicates
  itself can be flushed.
- `flush`: If it can be flushed, return a thread local
  `LastCumulativeAck` instance that contains the message ID and the bit
  set. The bit set is deep copied to avoid the original reference being
  recycled in another `update` call.

In addition, since the `messageId` field is volatile, the `getMessageId`
method can always retrieve the latest reference.

`LastCumulativeAckTest` is added to verify the sematics above.

Based on the new design, we can only maintain a `LastCumulativeAck`
field in `PersistentAcknowledgmentsGroupingTracker` and call the related
methods in `doCumulativeAck` and `flushAsync`. It also fixes the problem
that two concurrent `flushAsync` calls might send the same ACK command
twice.

(cherry picked from commit 936d6fd)
@BewareMyPower BewareMyPower added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life cherry-picked/branch-2.10 doc-not-needed Your PR changes do not impact docs release/2.8.4 release/2.9.4 release/2.10.2 type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants