Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2061,6 +2061,14 @@ public void testCheckpointForInactiveTaskGroup()
null
);

final TaskLocation location1 = new TaskLocation("testHost", 1234, -1);
final TaskLocation location2 = new TaskLocation("testHost2", 145, -1);
Collection workItems = new ArrayList<>();
workItems.add(new TestTaskRunnerWorkItem(id1, null, location1));
workItems.add(new TestTaskRunnerWorkItem(id2, null, location2));
workItems.add(new TestTaskRunnerWorkItem(id2, null, location2));

expect(taskRunner.getRunningTasks()).andReturn(workItems).anyTimes();
expect(taskMaster.getTaskQueue()).andReturn(Optional.of(taskQueue)).anyTimes();
expect(taskMaster.getTaskRunner()).andReturn(Optional.of(taskRunner)).anyTimes();
expect(taskStorage.getActiveTasks()).andReturn(ImmutableList.of(id1, id2, id3)).anyTimes();
Expand Down Expand Up @@ -2115,12 +2123,8 @@ public void testCheckpointForInactiveTaskGroup()

verifyAll();

while (serviceEmitter.getStackTrace() != null) {
Thread.sleep(100);
}

Assert.assertNull(serviceEmitter.getStackTrace());
Assert.assertNull(serviceEmitter.getExceptionMessage());
Assert.assertNull(serviceEmitter.getStackTrace(), serviceEmitter.getStackTrace());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a fix or just temp debugging arrangement to see what is in the stacktrace so that later a fix can be done ?

test times out because serviceEmitter.getStackTrace() is non-null , so this assertion is gonna fail in those cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this doesn't fix the root cause, but fixes the infinite loop and print the stack trace so that we can fix it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title misled me :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, do you want this PR to be merged or you're trying builds on this PR to see things whenever this test fails ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the title looks somewhat confusing :(. Do you have a better idea?

I thought it would be great if the CI for this PR shows something interesting, but unfortunately it succeeded. I can restart it until I find something, or check other PRs if they fail with some stack traces. I don't have a strong opinion here, but I would say this PR gives us a small benefit by making CI jobs fail faster.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, "Add debug information to expose transient failure cause in KafkaSupervisorTest.testCheckpointForInactiveTaskGroup", if this PR is merged without actually fixing.

I'm approving it so feel free to merge whenever you want after trying out the builds here. I triggered one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. It catches the exception. I'll check it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception was

  KafkaSupervisorTest.testCheckpointForInactiveTaskGroup:2118 java.lang.AssertionError: 
  Unexpected method call TaskRunner.getRunningTasks():
	at org.easymock.internal.MockInvocationHandler.invoke(MockInvocationHandler.java:44)
	at org.easymock.internal.ObjectMethodsFilter.invoke(ObjectMethodsFilter.java:94)
	at com.sun.proxy.$Proxy43.getRunningTasks(Unknown Source)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor$1.getTaskLocation(KafkaSupervisor.java:296)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor.checkpointTaskGroup(KafkaSupervisor.java:1503)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor.checkTaskDuration(KafkaSupervisor.java:1433)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor.runInternal(KafkaSupervisor.java:879)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor$RunNotice.handle(KafkaSupervisor.java:595)
	at io.druid.indexing.kafka.supervisor.KafkaSupervisor$2.run(KafkaSupervisor.java:369)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

So, I added expect(taskRunner.getRunningTasks()).andReturn(workItems).anyTimes();. Hopefully this fixes this issue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leventov thanks. I've also noticed that before. The problem is that the supervisor periodically executes runNotice and this can potentially call every method of the supervisor which in turn requires to mock everything using EasyMock.

I think it's better to refactor the whole KafkaSupervisorTest to be based on a sort of more controllable mockup environment by creating taskRunner, taskMaster, taskStorage, and taskClient for test purpose instead of making them using EasyMock.

Raised #6296.

Assert.assertNull(serviceEmitter.getExceptionMessage(), serviceEmitter.getExceptionMessage());
Assert.assertNull(serviceEmitter.getExceptionClass());
}

Expand Down