KAFKA-7670: Admin client testUnreachableBootstrapServer() flaky test#5942
KAFKA-7670: Admin client testUnreachableBootstrapServer() flaky test#5942hachikuji merged 1 commit intoapache:trunkfrom stanislavkozlovski:KAFKA-7670-admin-client-flaky-test
Conversation
|
You mean that we add a method to the collection while it's being iterated and clear the whole collection even though we never processed some of the elements? |
|
@stanislavkozlovski There is a timing window in the way metadata response is prepared because |
|
|
|
@ijuma No, we don't ever clear the collection. I only found out that while iterating through it in some runs we never find any elements inside. @rajinisivaram I didn't know that, thanks. I can't reproduce this failure when I add more code for debugging purposes - it just doesn't fail. It's unclear to me why that is. I wonder if it is worth it to change the code such that we instantiate the AdminClient on demand Update: @rajinisivaram just responded as I was writing this. I'll go with the approach of instantiating AdminClient |
|
This looks good to me. Would it be less error prone to make the AdminClientUnitTestEnv take a MockClient in the constructor (pre-prepared), instead of relying on the user to create the env, then prepare the MockClient, then initiate? |
|
@lbradstreet I guess you meant take in a You're right that this is a bit more error prone. My thinking is that it should be caught in the first test run and as such maybe not have as big as an impact. I don't have strong feelings about this though |
|
@stanislavkozlovski, you're probably right that it would be significantly more boilerplate. Do all prepare statements have to take place prior to the |
|
@rajinisivaram can you please review these changes? |
There was a problem hiding this comment.
@stanislavkozlovski Couldn't we add a new constructor with a boolean parameter to decide whether adminClient should be created here? So all tests except testUnreachableBootstrapServer don't need to remember to invoke initiate(). And testUnreachableBootstrapServer will be explicitly not creating the admin client, so it will be more obvious that it needs to create it separately. Perhaps that would be less error-prone for the future?
There was a problem hiding this comment.
Sounds good. I updated the javadoc to describe that as well
rajinisivaram
left a comment
There was a problem hiding this comment.
@stanislavkozlovski Thanks for the updates, looks good. Left just a couple of minor comments.
There was a problem hiding this comment.
nit: This could just call initiate()?
There was a problem hiding this comment.
nit: Perhaps call this method createAdminClient() to be consistent with the flag in the constructor?
|
@stanislavkozlovski I found a simpler explanation for the transient failures. Basically List<ClientResponse> copy = new ArrayList<>(responses);
ClientResponse response;
while ((response = this.responses.poll()) != null) {
response.onComplete();
}
return copy;The List<ClientResponse> copy = new ArrayList<>();
ClientResponse response;
while ((response = this.responses.poll()) != null) {
response.onComplete();
copy.add(response);
}
return copy;After this change, I can no longer reproduce the failure. Of course it's debatable whether we should allow concurrent sends, but these tests currently depend on it. |
It used to preallocate an array of responses and then complete each response from the original collection sequentially. The problem was that the original collection could have been modified (another thread completing the response) while this was hapenning
|
@hachikuji agreed this seems like a cleaner solution. I also tried and failed to reproduce after using your suggested change. Thanks a lot! |
hachikuji
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the patch!
It used to preallocate an array of responses and then complete each response from the original collection sequentially. The problem was that the original collection could have been modified (another thread completing the response) while this was hapenning
It used to preallocate an array of responses and then complete each response from the original collection sequentially. The problem was that the original collection could have been modified (another thread completing the response) while this was hapenning
It used to preallocate an array of responses and then complete each response from the original collection sequentially. The problem was that the original collection could have been modified (another thread completing the response) while this was hapenning
It used to preallocate an array of responses and then complete each response from the original collection sequentially. The problem was that the original collection could have been modified (another thread completing the response) while this was hapenning
* ak/trunk: (45 commits) KAFKA-7487: DumpLogSegments misreports offset mismatches (apache#5756) MINOR: improve JavaDocs about auto-repartitioning in Streams DSL (apache#6269) KAFKA-7935: UNSUPPORTED_COMPRESSION_TYPE if ReplicaManager.getLogConfig returns None (apache#6274) KAFKA-7895: Fix stream-time reckoning for suppress (apache#6278) KAFKA-6569: Move OffsetIndex/TimeIndex logger to companion object (apache#4586) MINOR: add log indicating the suppression time (apache#6260) MINOR: Make info logs for KafkaConsumer a bit more verbose (apache#6279) KAFKA-7758: Reuse KGroupedStream/KGroupedTable with named repartition topics (apache#6265) KAFKA-7884; Docs for message.format.version should display valid values (apache#6209) MINOR: Save failed test output to build output directory MINOR: add test for StreamsSmokeTestDriver (apache#6231) MINOR: Fix bugs identified by compiler warnings (apache#6258) KAFKA-6474: Rewrite tests to use new public TopologyTestDriver [part 4] (apache#5433) MINOR: fix bypasses in ChangeLogging stores (apache#6266) MINOR: Make MockClient#poll() more thread-safe (apache#5942) MINOR: drop dbAccessor reference on close (apache#6254) KAFKA-7811: Avoid unnecessary lock acquire when KafkaConsumer commits offsets (apache#6119) KAFKA-7916: Unify store wrapping code for clarity (apache#6255) MINOR: Add missing Alter Operation to Topic supported operations list in AclCommand KAFKA-7921: log at error level for missing source topic (apache#6262) ...
It used to preallocate an array of responses and then complete each response from the original collection sequentially. The problem was that the original collection could have been modified (another thread completing the response) while this was hapenning
This test easily fails locally around once every 20-30 runs.
After spending considerate time debugging, I found out that the weakly-consistent iterator of
futureResponsesdoes not iterate through the two responses at all in the failing runs.No elements are ever removed from
futureResponsesexcept in theMockClient#reset()method which I verified is never called. I think the cause is the weakly-consistent iterator of Java which states:I'm still unsure how to solve this in the cleanest way. I'm opening this PR as a chance to discuss it with other people