Fixed race condition in schema initialization in partitioned topics #2959
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
There is a race condition when producers and consumers are connecting to a new partitioned topic concurrently and try to initialize the schema.
That results in consumers getting subscribe error (upon application retry, they will succeed).
The exception is like:
The main issue is that
getOrCreateSchemaLocator()is creating the z-node with a dummy marker (ledgerId=-1) and then creates a new ledger and finally updates the z-node with the real ledger id.Because of that, consumers might see the z-node pointing to ledger -1 and hence the error.
Modifications
getOrCreateSchemaLocator(). Instead, we do get(), then create ledger and then try to create z-node with real ledger id. There would not be incomplete state visible.