Skip to content

[#4583] fix(messaging): SplitTask leaves token store inconsistent after segment split#4587

Open
MateuszNaKodach wants to merge 2 commits into
axon-5.1.xfrom
fix/4583
Open

[#4583] fix(messaging): SplitTask leaves token store inconsistent after segment split#4587
MateuszNaKodach wants to merge 2 commits into
axon-5.1.xfrom
fix/4583

Conversation

@MateuszNaKodach
Copy link
Copy Markdown
Contributor

@MateuszNaKodach MateuszNaKodach commented May 18, 2026

closes #4583

Bug description

Introduced in: commit 185c4332"When splitting and merging tokens, update both tokens involved"

Before persistent segment masks were introduced, SplitTask only needed to insert the sibling segment and call releaseClaim(original) to release ownership. When that commit added support for updating the original segment's mask, it appended deleteToken(original) + initializeSegment(original with new mask) after the existing releaseClaim rather than replacing it:

releaseClaim(original)   ← sets owner = NULL
deleteToken(original)    ← DELETE WHERE owner = nodeId → 0 rows → UnableToClaimTokenException
initializeSegment(...)   ← never reached

MergeTask has the same problem in its mergeSegments() method: after initializeSegment(mergedSegment) creates the merged token with owner = NULL, the code called releaseClaim(merged), which again matches zero rows and throws UnableToClaimTokenException, preventing the coordinator from reclaiming the merged segment.

The InMemoryTokenStore (used in all existing unit tests) has a no-op releaseClaim and ignores ownership in deleteToken, so both bugs were invisible to the test suite and only surfaced against a real JDBC or JPA token store.


What was fixed

SplitTask.splitAndRelease()releaseClaim() was removed entirely and the operation order corrected:

  1. initializeSegment(sibling) — insert sibling with owner = NULL
  2. deleteToken(original) — DELETE while owner = nodeId still holds
  3. initializeSegment(original with new mask) — re-insert with owner = NULL, ready for reclaiming

MergeTask.mergeSegments()releaseClaim() was removed. After the two source tokens are deleted and the merged token is initialized with owner = NULL, no claim release is needed — the coordinator picks it up on the next cycle.

JpaTokenStore.deleteToken() (found thanks to the new integration tests) — A secondary issue specific to JPA: after a JPQL bulk DELETE the Hibernate identity map still holds the deleted TokenEntry, causing NonUniqueObjectException when initializeSegment calls em.persist() with the same PK in the same persistence context. Fixed by calling em.flush() + em.clear() after executeUpdate() to evict the stale entry.


Tests introduced

Unit — SplitTaskTest (existing class updated):

  • Removed releaseClaim mock setup and verification from existing tests; added verify(never()).releaseClaim()
  • Added a new test asserting the exact operation order: initializeSegment(sibling)deleteTokeninitializeSegment(original)

Unit — MergeTaskTest (existing class updated):

  • Removed releaseClaim mock setup and verification from existing tests; added verify(never()).releaseClaim()

Integration — new Spring Boot IT suite against real token stores:

It's a cornerstone for better test coverage, providing more valuable end-to-end tests of PooledStreamingEventProcessor.

PooledStreamingEventProcessorTestSuite (added to messaging test-jar, no Spring dependency) holds all test logic. It requires subclasses to implement only a buildProcessor() factory method, making it reusable outside Spring contexts. Each test creates its own processor with a UUID-based name for natural token store row isolation — no schema teardown or context reset needed.

PooledStreamingEventProcessorSpringTestSuite (in spring-boot-autoconfigure) is a thin Spring subclass that builds the processor from components obtained via @Autowired AxonConfiguration.

Two concrete IT classes run the full suite against HSQLDB in-memory:

  • PooledStreamingJdbcTokenStoreIT — overrides the JPA token store with a JdbcTokenStore bean
  • PooledStreamingJpaTokenStoreIT — uses full JPA autoconfiguration with ddl-auto=create-drop

The suite covers two scenarios against both token stores:

  • WhenSplittingSegments — calls processor().splitSegment(0) end-to-end and asserts both resulting segments become active, directly reproducing the original bug
  • WhenMergingSegments — splits first, then calls processor().mergeSegment(0) and asserts the merged segment becomes active, reproducing the symmetric MergeTask bug

@MateuszNaKodach MateuszNaKodach changed the title Fix/4583 [#4583] fix(messaging): SplitTask leaves token store inconsistent after segment split May 18, 2026
…er segment split

SplitTask.splitAndRelease() called releaseClaim() before deleteToken(),
setting owner=NULL so the DELETE WHERE owner=nodeId found 0 rows and threw
UnableToClaimTokenException. Fix: remove releaseClaim() entirely and move
deleteToken() before initializeSegment(original) so the DELETE runs while
the token is still owned. The re-insert already creates the row with
owner=NULL, making releaseClaim redundant.

JpaTokenStore.deleteToken() now calls em.flush()+em.clear() after the bulk
JPQL DELETE so the deleted entity is evicted from the persistence context,
preventing NonUniqueObjectException when initializeSegment() persists a new
entity with the same PK in the same unit of work.

Add Spring Boot integration tests against JDBC and JPA token stores to
reproduce and verify the fix end-to-end.
@MateuszNaKodach MateuszNaKodach self-assigned this May 18, 2026
@MateuszNaKodach MateuszNaKodach added Type: Bug Use to signal issues that describe a bug within the system. Priority 1: Must Highest priority. A release cannot be made if this issue isn’t resolved. labels May 18, 2026
@MateuszNaKodach MateuszNaKodach added this to the Release 5.1.1 milestone May 18, 2026
@MateuszNaKodach MateuszNaKodach marked this pull request as ready for review May 18, 2026 14:07
@MateuszNaKodach MateuszNaKodach requested a review from a team as a code owner May 18, 2026 14:07
@MateuszNaKodach MateuszNaKodach requested review from hjohn, jangalinski and laura-devriendt-lemon and removed request for a team May 18, 2026 14:07
… segment merge

MergeTask called releaseClaim after initializeSegment, but initializeSegment
always creates tokens with owner=NULL, making the subsequent releaseClaim
(UPDATE SET owner=NULL WHERE owner=nodeId) match zero rows and throw
UnableToClaimTokenException. Remove the call to fix post-merge reclaim.

Integration tests redesigned with UUID-per-test processor names and a
buildProcessor() factory, eliminating @DirtiesContext and Spring property
wiring while providing natural token store isolation per test.
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Priority 1: Must Highest priority. A release cannot be made if this issue isn’t resolved. Type: Bug Use to signal issues that describe a bug within the system.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant