Skip to content

bugfix: when building materialized-view, if taskCount>1, may cause concurrentModificationException#6690

Merged
jihoonson merged 5 commits intoapache:masterfrom
pzhdfy:mateview
Feb 19, 2019
Merged

bugfix: when building materialized-view, if taskCount>1, may cause concurrentModificationException#6690
jihoonson merged 5 commits intoapache:masterfrom
pzhdfy:mateview

Conversation

@pzhdfy
Copy link
Copy Markdown
Contributor

@pzhdfy pzhdfy commented Nov 30, 2018

In checkSegmentsAndSubmitTasks:

for (Map.Entry<Interval, HadoopIndexTask> entry : runningTasks.entrySet()) {
        Optional<TaskStatus> taskStatus = taskStorage.getStatus(entry.getValue().getId());
        if (!taskStatus.isPresent() || !taskStatus.get().isRunnable()) {
          runningTasks.remove(entry.getKey());
          runningVersion.remove(entry.getKey());
        }
      }

if taskCount>1, while iterating runningTasks and removing entry from runningTasks will happen concurrent, this will cause concurrentModificationException.

use ConcurrentHashMap instead of HashMap for runningTasks, will fix

@fjy fjy added the Bug label Dec 3, 2018
@fjy fjy added this to the 0.13.1 milestone Dec 3, 2018
@jihoonson
Copy link
Copy Markdown
Contributor

@pzhdfy thanks for the patch and sorry for late review. Do you know what threads can read and remove at the same time? It's currently guarded by taskLock.

    synchronized (taskLock) {
      for (Map.Entry<Interval, HadoopIndexTask> entry : runningTasks.entrySet()) {
        Optional<TaskStatus> taskStatus = taskStorage.getStatus(entry.getValue().getId());
        if (!taskStatus.isPresent() || !taskStatus.get().isRunnable()) {
          runningTasks.remove(entry.getKey());
          runningVersion.remove(entry.getKey());
        }
      }
      if (runningTasks.size() == maxTaskCount) {
        //if the number of running tasks reach the max task count, supervisor won't submit new tasks.
        return;
      }
      Pair<SortedMap<Interval, String>, Map<Interval, List<DataSegment>>> toBuildIntervalAndBaseSegments = checkSegments();
      SortedMap<Interval, String> sortedToBuildVersion = toBuildIntervalAndBaseSegments.lhs;
      Map<Interval, List<DataSegment>> baseSegments = toBuildIntervalAndBaseSegments.rhs;
      missInterval = sortedToBuildVersion.keySet();
      submitTasks(sortedToBuildVersion, baseSegments);
    }

@pzhdfy
Copy link
Copy Markdown
Contributor Author

pzhdfy commented Feb 2, 2019

@jihoonson

Yes, this is guarded by taskLock .

This is just the WRONG USE of hashmap.

if we remove an entry from a hashmap by remove(key), while iterating it, this will throw a concurrentModificationException, even in a single thread.

image

ref:https://stackoverflow.com/questions/602636/concurrentmodificationexception-and-a-hashmap

for (Map.Entry<Interval, HadoopIndexTask> entry : runningTasks.entrySet()) {
        Optional<TaskStatus> taskStatus = taskStorage.getStatus(entry.getValue().getId());
        if (!taskStatus.isPresent() || !taskStatus.get().isRunnable()) {
          runningTasks.remove(entry.getKey());
          runningVersion.remove(entry.getKey());
        }
      }

@jihoonson
Copy link
Copy Markdown
Contributor

Oh I see. You’re right. Thanks.

I think it would be better to first find the entries to be removed, and then remove then all at once after iteration. I think this is better than using ConcurrentMap because there’s no concurrency issue. What do you think?

Also, all bug fix PRs should have unit tests. Would you please add one?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Feb 5, 2019

Or how about using .remove() on the iterator? I think if these maps are not actually going to be used from multiple threads, then we don't need to make them ConcurrentMaps.

@jihoonson
Copy link
Copy Markdown
Contributor

Hi @pzhdfy, were you able to check this issue again by any chance?

@pzhdfy
Copy link
Copy Markdown
Contributor Author

pzhdfy commented Feb 14, 2019

@jihoonson
I have updated the pr,
removing entry after iteration instead of using ConcurrentMap, and add unit test

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pzhdfy thank you for the update! Please check my comment.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @pzhdfy thank you for the quick fix!

@jihoonson jihoonson merged commit 7d1e8f3 into apache:master Feb 19, 2019
jon-wei pushed a commit to jon-wei/druid that referenced this pull request Feb 28, 2019
…ncurrentModificationException (apache#6690)

* bugfix: when building materialized-view, if taskCount >1, may cause ConcurrentModificationException

* remove entry after iteration instead of using ConcurrentMap, and add unit test

* small change

* modify unit test for coverage

* remove unused method
clintropolis pushed a commit that referenced this pull request Mar 1, 2019
…ncurrentModificationException (#6690) (#7165)

* bugfix: when building materialized-view, if taskCount >1, may cause ConcurrentModificationException

* remove entry after iteration instead of using ConcurrentMap, and add unit test

* small change

* modify unit test for coverage

* remove unused method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants