Skip to content

Fix NPE while handling CheckpointNotice in KafkaSupervisor#5996

Merged
jon-wei merged 7 commits intoapache:masterfrom
jihoonson:fix-npe-supervisor
Jul 14, 2018
Merged

Fix NPE while handling CheckpointNotice in KafkaSupervisor#5996
jon-wei merged 7 commits intoapache:masterfrom
jihoonson:fix-npe-supervisor

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

Fixes #5992.

In this PR, the supervisor has been changed to ignore inactive taskGroup (which is not in taskGroups map) while processing a checkpointNotice if it's a known taskGroup (which means it is in either pendingCompletionTaskGroups or partitionGroups maps). See CheckpointNotice. isValidTaskGroup().

Additionally,

  • Removed unnecessary sequenceTaskGroup map from KafkaSupervisor. In KafkaSupervisor, there is a ConcurrentHashMap called taskGroups which maps a taskGroupId to an active taskGroup. This is very similar to sequenceTaskGroup which maps a baseSequenceName to an active taskGroup because baseSequenceName is unique for a taskGroup. Also, sequenceTaskGroup was being changed along with taskGroups. We don't have to maintain two ConcurrentHashMaps which can even cause a race condition.
  • Since sequenceTaskGroup has been removed, the checkpoint request now contains a taskGroupId which is assigned to KafkaIndexTasks via KafkaIOConfig.
  • Fixed wrong Nullable annotation in Supervisor.checkpoint() method.
  • Changed the map.putIfAbsent() & map.get() pattern to map.computeIfAbsent().

@Nullable DataSourceMetadata previousCheckPoint,
@Nullable DataSourceMetadata currentCheckPoint
)
public void checkpoint(int taskGroupId, DataSourceMetadata previousCheckPoint, DataSourceMetadata currentCheckPoint)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need fix code style issues

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asdf2014 thanks for the review! Maybe you saw an old commit. I've resolved all code style issues.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um.. I am sure. It still exists in the latest version code. 😅

public void checkpoint(int taskGroupId, DataSourceMetadata previousCheckPoint, DataSourceMetadata currentCheckPoint)

should be

public void checkpoint(
      int taskGroupId,
      DataSourceMetadata previousCheckPoint,
      DataSourceMetadata currentCheckPoint
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should see the latest version code when i visit https://github.com/apache/incubator-druid/pull/5996/files this page, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're seeing the latest one. That line doesn't exceed 120 columns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, you're right.

taskGroup
) == null) {
sequenceTaskGroup.put(generateSequenceName(taskGroup), taskGroups.get(taskGroupId));
log.info("Created new task group [%d]", taskGroupId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this removes the logging event for new task groups, can we preserve that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I realized this log can be useful to debug supervisor's behavior for a specific taskGroupId.

@fjy
Copy link
Copy Markdown
Contributor

fjy commented Jul 13, 2018

👍

@jon-wei jon-wei merged commit c48aa74 into apache:master Jul 14, 2018
clintropolis pushed a commit to implydata/druid-public that referenced this pull request Jul 18, 2018
* Fix NPE while handling CheckpointNotice

* fix code style

* Fix test

* fix test

* add a log for creating a new taskGroup

* fix backward compatibility in KafkaIOConfig
clintropolis added a commit to implydata/druid-public that referenced this pull request Jul 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants