Skip to content

Fix IllegalArgumentException in TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2#6086

Merged
jihoonson merged 4 commits intoapache:masterfrom
jihoonson:fix-task-lockbox-sync
Aug 4, 2018
Merged

Fix IllegalArgumentException in TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2#6086
jihoonson merged 4 commits intoapache:masterfrom
jihoonson:fix-task-lockbox-sync

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

This bug was introduced in #6050. When updating the druid cluster from 0.12.x to 0.12.2, taskLocks in metastore should have the proper priority while tasks might not have it in taskContext. If priority isn't in taskContext, they return the default priority before #6050. But after #6050, tasks return 0 priority if priority doesn't exist in taskContext which causes this bug.

In this patch, I changed the sanity check to pass if task doesn't have the priority and taskLock priority is same with task's default priority. Also added unit tests.

@jihoonson jihoonson added the Bug label Aug 1, 2018
@jihoonson jihoonson added this to the 0.12.2 milestone Aug 1, 2018
// The priority of task and taskLock can be different when updating cluster if the task doesn't have the
// priority in taskContext while taskLock has. In this case, we simply ignores the task priority and uses the
// taskLock priority.
if (task.getContextValue(Tasks.PRIORITY_KEY) == null && task.getDefaultPriority() == taskLock.getPriority()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this line. If we have a task lock saved from an earlier version of Druid with missing priority (so it becomes 0 when read from the DB) and a task running with no priority specified, then:

  • task.getContextValue(Tasks.PRIORITY_KEY) is null
  • task.getDefaultPriority() is some default priority like 75
  • taskLock.getPriority() is 0

So this expression is overall false, and we will go to the else block and throw an exception. It seems like the problem mentioned in the PR description is not fixed? But if the unit tests pass, how can that be?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it a bit more and I think the issue is that Task.java has:

  default int getPriority()
  {
    return getContextValue(Tasks.PRIORITY_KEY, Tasks.DEFAULT_TASK_PRIORITY);
  }

And NoopTask does not override this. Since Tasks.DEFAULT_TASK_PRIORITY is 0, NoopTask has 0 priority, and so the tests are too easy to pass.

I think Task should change to:

  default int getPriority()
  {
    return getContextValue(Tasks.PRIORITY_KEY, getDefaultPriority());
  }

That way, the behavior of getPriority() is consistent with getDefaultPriority(). It seems like the only reason that priorities work at all today is that the OverlordResource injects them with this code:

              // Set default priority if needed
              final Integer priority = task.getContextValue(Tasks.PRIORITY_KEY);
              if (priority == null) {
                task.addToContext(Tasks.PRIORITY_KEY, task.getDefaultPriority());
              }

I think this won't be necessary if Task.getPriority() is fixed.

Copy link
Copy Markdown
Contributor Author

@jihoonson jihoonson Aug 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm here is the thing.

Before 0.12, both task and taskLock don't have priority. And before #6050, the task works as you described above while the taskLock returns 0 if priority doesn't exist. So, when taskLockBox.syncFromStorage() is called, they work like below.

priority container value
task no priority
taskLock no priority
task.getPriority() default priority depending on task type
taskLock.getPriority() 0

This causes the priority mismatch between task and taskLock, so #6050 was to fix this issue. In that PR, I've changed to inject the default priority to submitted tasks if they don't have priority. So, now task.getPriority() returns the priority what it has or 0 which is the same with how taskLock.getPriority() works.

However, after this patch, taskLockBox.syncFromStorage() can introduce the below situation.

priority container value
task no priority
taskLock default priority depending on task type
task.getPriority() 0
taskLock.getPriority() default priority depending on task type

This can happen because the priority of taskLock is initialized with task.getPriority() which returns the default priority if priority doesn't exist in taskContext before #6050. So, if the task was created before #6050, the priority of task and its taskLock would be different.

@jihoonson jihoonson changed the title Fix IllegalArgumentException in TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2 Fix IllegalArgumentException in TaskLockBox.syncFromStorage() introduced in 6050 Aug 1, 2018
@jihoonson jihoonson changed the title Fix IllegalArgumentException in TaskLockBox.syncFromStorage() introduced in 6050 Fix IllegalArgumentException in TaskLockBox.syncFromStorage() when updating from 0.12.x to 0.12.2 Aug 1, 2018
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Aug 3, 2018

LGTM after CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants