KAFKA-15764: Missing Tests for Transactions#14702
Conversation
7e64548 to
1c7a52b
Compare
| props.put(KafkaConfig.AutoLeaderRebalanceEnableProp, false.toString) | ||
| props.put(KafkaConfig.GroupInitialRebalanceDelayMsProp, "0") | ||
| props.put(KafkaConfig.TransactionsAbortTimedOutTransactionCleanupIntervalMsProp, "200") | ||
| props.put(KafkaConfig.NumNetworkThreadsProp, 2.toString) |
There was a problem hiding this comment.
I need to see if changing these values causes issues for the other tests.
There was a problem hiding this comment.
I ran the other tests about 20 times locally and didn't see issues.
| transactionalCompressionProducers += createTransactionalProducer("transactional-compression-producer-" + i.toString, compressionType = "snappy") | ||
| } | ||
|
|
||
| // KAFKA-15653 is triggered more easily with replication factor 1 |
There was a problem hiding this comment.
This made sense to me before, but I'm trying to remember why. 😅
Is it possible that when we are building and sending the response we could receive the callback request and that means it will be handled on a different thread? Vs when acks=all, we free the thread after sending out the verification, so it can be available to receive the callback?
There was a problem hiding this comment.
Not sure I follow that. Whether there are any replicas or not, wouldn't the request thread will be freed after sending out the verification? One way I could see the case being more likely is if we added additional request threads. I don't think there's any affinity in the callback to the original request thread, is there? So more available request threads probably means a greater chance a separate thread would handle the callback. Does that make sense?
There was a problem hiding this comment.
Sorry I think I got confused about when the response is being sent. I will say that adding more request threads didn't work and I had to lower them to get it to reproduce frequently?
I can try it the other way around, but iirc, there were a lot of combinations that were not as consistent as it is now.
There was a problem hiding this comment.
My take is that this test should probably just test compression of transactional data with the default settings. I don't think it necessarily needs to hit the case from KAFKA-15653. It seems like it doesn't do so reliably anyway. Are there lower level tests where we can validate callback safety?
There was a problem hiding this comment.
I didn't see it fail much at all with the default settings. It was like once in 60 times or less which was not super useful. There are lower level tests for the callback using the correct requestLocal.
There was a problem hiding this comment.
Yeah, what I'm saying is that this test seems more useful as a general validation of transactional produce with compression. From that perspective, the default settings are the most useful to test. For KAFKA-15653, the lower level tests seem sufficient.
There was a problem hiding this comment.
I set back the default settings and it seems to trigger consistently again anyway. Maybe I just tricked myself into thinking I needed the other configs.
|
I've restarted the build every day since the approval and have yet to get a clean build 😵💫 |
|
I'm convinced this PR is cursed. Still failing builds. :( |
|
Given that this failed on a storage test (storage tests shouldn't be affected here), and that it was able to build on some of the many other runs I tried, I am going to merge this after 12 rebuilds. 😵 |
This reverts commit ed7ad6d.
|
@jolshan This PR might indeed be cursed. We have been seeing a lot of failures of |
This reverts commit ed7ad6d. We have been seeing a lot of failures of TransactionsWithTieredStoreTest.testTransactionsWithCompression on trunk, and it seems to start with this PR. I see how this PR can influence the test via the change in TestUtils. The bad part is that sometimes seems to kill the Gradle Executors completely. So I'd suggest reverting the change before investigating further to stabilize CI. Reviewers: Bruno Cadonna <cadonna@apache.org>
I ran this test 40 times without KAFKA-15653 with and without compression enabled. With compression it failed 39/40 times and without it passed 40/40 times. With the KAFKA-15653 and compression it passed 40/40 times locally Reviewers: Jason Gustafson <jason@confluent.io>
…pache#15029) This reverts commit ed7ad6d. We have been seeing a lot of failures of TransactionsWithTieredStoreTest.testTransactionsWithCompression on trunk, and it seems to start with this PR. I see how this PR can influence the test via the change in TestUtils. The bad part is that sometimes seems to kill the Gradle Executors completely. So I'd suggest reverting the change before investigating further to stabilize CI. Reviewers: Bruno Cadonna <cadonna@apache.org>
I ran this test 40 times without KAFKA-15653 with and without compression enabled. With compression it failed 39/40 times and without it passed 40/40 times. With the KAFKA-15653 and compression it passed 40/40 times locally Reviewers: Jason Gustafson <jason@confluent.io>
…pache#15029) This reverts commit ed7ad6d. We have been seeing a lot of failures of TransactionsWithTieredStoreTest.testTransactionsWithCompression on trunk, and it seems to start with this PR. I see how this PR can influence the test via the change in TestUtils. The bad part is that sometimes seems to kill the Gradle Executors completely. So I'd suggest reverting the change before investigating further to stabilize CI. Reviewers: Bruno Cadonna <cadonna@apache.org>
I ran this test 40 times without KAFKA-15653 with and without compression enabled. With compression it failed 39/40 times and without it passed 40/40 times. With the KAFKA-15653 and compression it passed 40/40 times locally Reviewers: Jason Gustafson <jason@confluent.io>
…pache#15029) This reverts commit ed7ad6d. We have been seeing a lot of failures of TransactionsWithTieredStoreTest.testTransactionsWithCompression on trunk, and it seems to start with this PR. I see how this PR can influence the test via the change in TestUtils. The bad part is that sometimes seems to kill the Gradle Executors completely. So I'd suggest reverting the change before investigating further to stabilize CI. Reviewers: Bruno Cadonna <cadonna@apache.org>
I ran this test 40 times without KAFKA-15653 with and without compression enabled.
With compression it failed 39/40 times and without it passed 40/40 times.
With the KAFKA-15653 and compression it passed 40/40 times locally