Skip to content

KAFKA-12195 Fix synchronization issue happening in KafkaStreams#9887

Merged
chia7712 merged 8 commits intoapache:trunkfrom
chia7712:KAFKA-12195
Jan 19, 2021
Merged

KAFKA-12195 Fix synchronization issue happening in KafkaStreams#9887
chia7712 merged 8 commits intoapache:trunkfrom
chia7712:KAFKA-12195

Conversation

@chia7712
Copy link
Copy Markdown
Member

@chia7712 chia7712 commented Jan 14, 2021

issue: https://issues.apache.org/jira/browse/KAFKA-12195
related to aedb53a

Root Cause

It seems to me there are two issues on AdjustStreamThreadCountTest.

1) synchronization issue (the number of threads is inconsistent)

threads = Collections.synchronizedList(new LinkedList<>());

The synchronization list requires us to manually synchronize the iterator. non-synchronizing the list results in inconsistent results and consequently unstabilize the AdjustStreamThreadCountTest.

2) duplicate thread index when adding new stream thread

createAndAddStreamThread is not called with holding changeThreadCount so it is possible that we create two threads with same thread index. It makes different threads have same metadata and then localThreadsMetadata returns incorrect number of metadata.

3) assert intermediate state (it gets failed if it run too fast)

This PR already reverts the fix and it is traced by #9888

Test Result from PR

looped the test 100 times on my local. all pass

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

resizeThreadCache(cacheSizePerThread);
return Optional.of(streamThread.getName());
}
threads.remove(streamThread);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing element from collection when looping it is error-prone so I rewrite it by iterator.

@@ -148,7 +146,6 @@ public void shouldAddAndRemoveThreads() throws InterruptedException {
one.start();
latch.await(30, TimeUnit.SECONDS);
assertThat(kafkaStreams.localThreadsMetadata().size(), equalTo(oldThreadCount));
Copy link
Copy Markdown
Member Author

@chia7712 chia7712 Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert gets failed due to synchronization issue.

java.lang.AssertionError: 
Expected: <2>
     but: was <1>
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
	at org.apache.kafka.streams.integration.AdjustStreamThreadCountTest.shouldAddAndRemoveThreads(AdjustStreamThreadCountTest.java:149)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288)
	at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.lang.Thread.run(Thread.java:834)

@chia7712 chia7712 changed the title KAFKA-12195 Fix synchronization issue happening in KafkaStreams (rela… KAFKA-12195 Fix synchronization issue happening in KafkaStreams Jan 14, 2021
@chia7712 chia7712 marked this pull request as draft January 14, 2021 14:28
}

@Test
public void testConcurrentlyAccessThreads() throws InterruptedException {
Copy link
Copy Markdown
Member Author

@chia7712 chia7712 Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case can cause following error if we don't apply this patch.

org.apache.kafka.streams.integration.AdjustStreamThreadCountTest > testConcurrentlyAccessThreads FAILED
    java.lang.AssertionError: expected null, but was:<java.util.ConcurrentModificationException>
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotNull(Assert.java:756)
        at org.junit.Assert.assertNull(Assert.java:738)
        at org.junit.Assert.assertNull(Assert.java:748)
        at org.apache.kafka.streams.integration.AdjustStreamThreadCountTest.testConcurrentlyAccessThreads(AdjustStreamThreadCountTest.java:264)

resizeThreadCache(cacheSizePerThread);
// Creating thread should hold the lock in order to avoid duplicate thread index.
// If the duplicate index happen, the metadata of thread may be duplicate too.
streamThread = createAndAddStreamThread(cacheSizePerThread, threadIdx);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix for case 2)

@chia7712
Copy link
Copy Markdown
Member Author

@cadonna Could you take a look at those concurrent issues (see description) ? I'm digging in it to observe whether there are more issues. Thanks!

@ableegoldman
Copy link
Copy Markdown
Member

Hey @wcarlson5 can you also take a look at this?

Copy link
Copy Markdown
Contributor

@wcarlson5 wcarlson5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seems to change the actual logic so that shouldn't be a problem. The copy of the thread list to avoid an extra lock was a good idea. With #9888 this should fix the issues we are seeing and I don't see any other problems with it. @ableegoldman the synchronization of the thread list iterators was a possible issue we talked about. I didn't think that it would be a problem if we were not altering the list but it appears that I was wrong. I only have one concern otherwise LGTM. @chia7712 thanks for cleaning this up!

streamThread.shutdown();
if (!streamThread.getName().equals(Thread.currentThread().getName())) {
streamThread.waitOnThreadState(StreamThread.State.DEAD);
synchronized (threads) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the threads object as a lock here? is that was the Collections.synchronizedList(new LinkedList<>()); uses internally?

I am a little concerned that we are holding changeThreadCount while waiting for threads

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. Let me use 'copy' to handle lock issue.

@chia7712 chia7712 marked this pull request as ready for review January 15, 2021 06:36
@guozhangwang
Copy link
Copy Markdown
Contributor

I made a pass and it looked good. @chia7712 leaving it to your call on the merging order between this one and #9888.

@guozhangwang
Copy link
Copy Markdown
Contributor

The unit tests may not be stable until these PRs are merged.

@chia7712
Copy link
Copy Markdown
Member Author

merge #9888 to run QA again.

@chia7712 chia7712 merged commit 6752f28 into apache:trunk Jan 19, 2021
@mjsax mjsax added the streams label Jan 21, 2021
@chia7712 chia7712 deleted the KAFKA-12195 branch March 25, 2024 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants