Fix timeout in KafkaSupervisorTest.testCheckpointForInactiveTaskGroup#6207
Fix timeout in KafkaSupervisorTest.testCheckpointForInactiveTaskGroup#6207fjy merged 4 commits intoapache:masterfrom
Conversation
|
|
||
| Assert.assertNull(serviceEmitter.getStackTrace()); | ||
| Assert.assertNull(serviceEmitter.getExceptionMessage()); | ||
| Assert.assertNull(serviceEmitter.getStackTrace(), serviceEmitter.getStackTrace()); |
There was a problem hiding this comment.
is this a fix or just temp debugging arrangement to see what is in the stacktrace so that later a fix can be done ?
test times out because serviceEmitter.getStackTrace() is non-null , so this assertion is gonna fail in those cases.
There was a problem hiding this comment.
Yeah, this doesn't fix the root cause, but fixes the infinite loop and print the stack trace so that we can fix it.
There was a problem hiding this comment.
so, do you want this PR to be merged or you're trying builds on this PR to see things whenever this test fails ?
There was a problem hiding this comment.
Yeah, the title looks somewhat confusing :(. Do you have a better idea?
I thought it would be great if the CI for this PR shows something interesting, but unfortunately it succeeded. I can restart it until I find something, or check other PRs if they fail with some stack traces. I don't have a strong opinion here, but I would say this PR gives us a small benefit by making CI jobs fail faster.
There was a problem hiding this comment.
maybe, "Add debug information to expose transient failure cause in KafkaSupervisorTest.testCheckpointForInactiveTaskGroup", if this PR is merged without actually fixing.
I'm approving it so feel free to merge whenever you want after trying out the builds here. I triggered one.
There was a problem hiding this comment.
Nice. It catches the exception. I'll check it.
There was a problem hiding this comment.
The exception was
KafkaSupervisorTest.testCheckpointForInactiveTaskGroup:2118 java.lang.AssertionError:
Unexpected method call TaskRunner.getRunningTasks():
at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:44)
at org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:94)
at com.sun.proxy.$Proxy43.getRunningTasks(Unknown Source)
at io.druid.indexing.kafka.supervisor.KafkaSupervisor$1.getTaskLocation(KafkaSupervisor.java:296)
at io.druid.indexing.kafka.supervisor.KafkaSupervisor.checkpointTaskGroup(KafkaSupervisor.java:1503)
at io.druid.indexing.kafka.supervisor.KafkaSupervisor.checkTaskDuration(KafkaSupervisor.java:1433)
at io.druid.indexing.kafka.supervisor.KafkaSupervisor.runInternal(KafkaSupervisor.java:879)
at io.druid.indexing.kafka.supervisor.KafkaSupervisor$RunNotice.handle(KafkaSupervisor.java:595)
at io.druid.indexing.kafka.supervisor.KafkaSupervisor$2.run(KafkaSupervisor.java:369)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
So, I added expect(taskRunner.getRunningTasks()).andReturn(workItems).anyTimes();. Hopefully this fixes this issue.
There was a problem hiding this comment.
@jihoonson this failure persists. See https://travis-ci.org/apache/incubator-druid/jobs/424467954.
There was a problem hiding this comment.
@leventov thanks. I've also noticed that before. The problem is that the supervisor periodically executes runNotice and this can potentially call every method of the supervisor which in turn requires to mock everything using EasyMock.
I think it's better to refactor the whole KafkaSupervisorTest to be based on a sort of more controllable mockup environment by creating taskRunner, taskMaster, taskStorage, and taskClient for test purpose instead of making them using EasyMock.
Raised #6296.
Let's see what is the error message.