Skip to content

[SPARK-45529][SS][TESTS] Decrease flakiness by ignore zero offset(wip)#43362

Closed
dengziming wants to merge 1 commit intoapache:masterfrom
dengziming:flaky-kafk
Closed

[SPARK-45529][SS][TESTS] Decrease flakiness by ignore zero offset(wip)#43362
dengziming wants to merge 1 commit intoapache:masterfrom
dengziming:flaky-kafk

Conversation

@dengziming
Copy link
Copy Markdown
Member

@dengziming dengziming commented Oct 13, 2023

What changes were proposed in this pull request?

Improve the infamous KafkaSourceStressSuite, I find there are more than one reason for this problem, the most frequently failure path is: [XXX, Add partition, CheckAnswer], I guess we are not granted to get offsets from empty partitions when fetching data in MicroBatchExecution.

I run the test locally more than 10 times and they are all successful after made this change, I even increased iterations from 50 to 100 and it only failed once in more than 10 times.

Why are the changes needed?

To make test less flaky, at least 90% more reliable the before. however, it's still possible to fail since I find some other reason

Does this PR introduce any user-facing change?

No

How was this patch tested?

Github CI

Was this patch authored or co-authored using generative AI tooling?

No

@dengziming
Copy link
Copy Markdown
Member Author

@dongjoon-hyun We can make it much more stable after we use equalsIgnoreZeroOffset, however, this change will make KafkaSourceOffset incompatible and make equals inconsistent with hashCode, so it's just an direction we can move forward. do you have any other ideas to ignore zero offset and maintain compatibility.

@LuciferYang LuciferYang changed the title [SPARK-45529][Tests] Decrease flakiness by ignore zero offset(wip) [SPARK-45529][TESTS] Decrease flakiness by ignore zero offset(wip) Oct 15, 2023
@LuciferYang LuciferYang changed the title [SPARK-45529][TESTS] Decrease flakiness by ignore zero offset(wip) [SPARK-45529][SS][TESTS] Decrease flakiness by ignore zero offset(wip) Oct 15, 2023
@LuciferYang
Copy link
Copy Markdown
Contributor

KafkaSourceOffsetSuite still failed with this fix.

https://github.com/dengziming/spark/actions/runs/6506157421/job/17671193820

[info] org.apache.spark.sql.kafka010.KafkaSourceOffsetSuite *** ABORTED *** (0 milliseconds)
[info]   Duplicate test name: comparison {"t":{"0":1}} <=> {"t":{"0":2,"1":1}} (OffsetSuite.scala:26)
[info]   org.scalatest.exceptions.DuplicateTestNameException:
[info]   at org.scalatest.SuperEngine.registerTest(Engine.scala:674)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.testImpl(AnyFunSuiteLike.scala:135)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.test(AnyFunSuiteLike.scala:154)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.test$(AnyFunSuiteLike.scala:153)
[info]   at org.apache.spark.SparkFunSuite.test(SparkFunSuite.scala:154)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceOffsetSuite.org$apache$spark$sql$test$SQLTestUtils$$super$test(KafkaSourceOffsetSuite.scala:27)
[info]   at org.apache.spark.sql.test.SQLTestUtils.test(SQLTestUtils.scala:129)
[info]   at org.apache.spark.sql.test.SQLTestUtils.test$(SQLTestUtils.scala:120)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceOffsetSuite.test(KafkaSourceOffsetSuite.scala:27)
[info]   at org.apache.spark.sql.streaming.OffsetSuite.compare(OffsetSuite.scala:26)
[info]   at org.apache.spark.sql.streaming.OffsetSuite.compare$(OffsetSuite.scala:25)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceOffsetSuite.compare(KafkaSourceOffsetSuite.scala:27)
[info]   at org.apache.spark.sql.kafka010.KafkaSourceOffsetSuite.<init>(KafkaSourceOffsetSuite.scala:43)
[info]   at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[info]   at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
[info]   at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[info]   at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
[info]   at java.base/java.lang.reflect.ReflectAccess.newInstance(ReflectAccess.java:128)
[info]   at java.base/jdk.internal.reflect.ReflectionFactory.newInstance(ReflectionFactory.java:347)
[info]   at java.base/java.lang.Class.newInstance(Class.java:645)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:454)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info]   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[info]   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info]   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info]   at java.base/java.lang.Thread.run(Thread.java:833)

@dengziming
Copy link
Copy Markdown
Member Author

We plan to upgrade Kafka to 3.6.1 later, the problem has been located, I closed this one.

@dengziming dengziming closed this Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants