Make realtimes available for loading segments by jihoonson · Pull Request #4148 · apache/druid

jihoonson · 2017-04-05T10:34:57Z

Part of #4032.
In this patch, I added SegmentManager which is separated from ServerManager and is responsible for loading and dropping segments for a node. This SegmentManager is added to both historicals and realtimes.

#4077 introduces BroadcastRule for join processing. I'll extend BroadcastRules to be applied for realtimes after it is merged.

This change is

leventov · 2017-04-28T13:02:28Z

+  {
+    return "ImmutableDruidDataSource{"
+           + "name='" + name
+           + "', segments='" + segmentsHolder


Usual Druid's toString pattern doesn't include '

Hmm, actually I followed some other toString() implementaions like DataSegment or SegmentDescriptor. It seems we need to make a standard for toString() first.

leventov · 2017-04-28T13:03:18Z

+           + "name='" + name
+           + "', segments='" + segmentsHolder
+           + "', properties='" + properties
+           + "'}";


Doesn't include partitionNames on purpose?

Yes, partitionNames usually includes a lot of partitions.

Please leave a comment

Added a comment.

leventov · 2017-04-28T13:03:53Z

+  public String toString()
+  {
+    return "ImmutableDruidServer{"
+           + "meta='" + metadata


leventov · 2017-04-28T13:04:12Z

+           + "meta='" + metadata
+           + "', size='" + currSize
+           + "', sources='" + dataSources
+           + "'}";


Doesn't include segments on purpose?

Yes, a server usually holds a lot of segments.

Please leave a comment

Added a comment.

leventov · 2017-04-28T13:06:09Z

   */
  private final Map<String, Map<Integer, FireChief>> chiefs;

+  private final SegmentManager segmentManager;


This field is created but not used

Ah, I added this field for the future use, but it would be fine to add later. I removed for now.

leventov · 2017-04-28T17:18:11Z

+
+  private final Object lock = new Object();
+  private final SegmentLoader segmentLoader;
+  private final Map<String, VersionedIntervalTimeline<String, ReferenceCountingSegment>> dataSources = new HashMap<>();


There could be one ConcurrentHashMap<String, DataSourceState>, where DataSourceState includes VersionedIntervalTimeline<String, ReferenceCountingSegment>, size and count. Then all synchronization could be delegated to ConcurrentHashMap methods, no explicit synchronized and locks are needed in SegmentManager. Also ConcurrentHashMap's concurrency is better than synchronization on a single object.

SegmentManager is simply separated from ServerManager. Your comments around SegmentManager and ServerManager look good, but I think it is not a part of this pr. Maybe better to raise a new issue after this pr.

Ok but would be nice if you could do this as part of this PR, it's 15 min of work, another issue and a separate PR will eat more of everybody's attention

For the issues which are less related to the original issues, I think it would be fine to fix them in a single PR if the changes are truly little and intuitive, but otherwise, it's better to separate to several PRs even though some PRs will be quite small. This is because

Authors and reviewers can focus on the original issues of the PR. This will increase review speed.

As you know, most changes require to add tests even though they are simple. This can be a burden for authors which makes the development slow down.

I think getting other people's attention will be good because they can review PRs from other points of view.

Besides, even in the world you mentioned, a lock is needed anyway for synchronization for accessing ConcurrentHashMap and accessing/mutating DataSourceState taken from ConcurrentHashMap (please see loadSegment()). For example,

private final ConcurrentHashMap<String, DataSourceState> dataSources; ... public boolean loadSegment(final DataSegment segment) throws SegmentLoadingException { synchronized (lock) { final DataSourceState dataSourceState = dataSources.get(dataSource); final PartitionHolder<ReferenceCountingSegment> entry = dataSourceState.findEntry(segment.getInterval(), segment.getVersion); ... dataSourceState.add( segment.getInterval(), segment.getVersion(), segment.getShardSpec().createChunk(new ReferenceCountingSegment(adapter)) ); ... } }

For this issue, I prefer to investigate first what the exact requirements are.

Ok for not including refactoring into this PR.

Synchronization that you mentioned still needed could be managed by ConcurrentHashMap:

dataSources.compute(dataSource, dataSourceState -> { if (dataSourceState == null) ... else { dataSourceState.add(...); return dataSourceState; } });

ConcurrentHashMap guarantees that executions of lambdas provided to compute(), computeIfAbsent(), merge() etc. are linearizable. Internally it is implemented via the same intrinsic locks that you use explicitly, but striped over the ConcurrentHashMap entries.

Ah, right. That way should work.
Thanks for your understanding. I'll raise a new PR after this PR.

leventov · 2017-04-28T17:20:57Z

    String dataSourceName = getDataSourceName(dataSource);

-    final VersionedIntervalTimeline<String, ReferenceCountingSegment> timeline = dataSources.get(dataSourceName);
+    final VersionedIntervalTimeline<String, ReferenceCountingSegment> timeline = segmentManager.getDataSources()


getDataSources() creates a copy, please add a method to SegmentManager to extract VersionedIntervalTimeline by dataSource.

Please address

Thanks. I added a getTimeline() method.

leventov · 2017-04-28T17:21:05Z

-    final VersionedIntervalTimeline<String, ReferenceCountingSegment> timeline = dataSources.get(
-        dataSourceName
-    );
+    final VersionedIntervalTimeline<String, ReferenceCountingSegment> timeline = segmentManager.getDataSources()


leventov · 2017-04-28T17:22:57Z

+   *
+   * @see io.druid.server.coordinator.rules.LoadRule
+   */
+  boolean segmentReplicatable()


Maybe isSegmentReplicationTarget()?

leventov · 2017-04-28T17:23:14Z

+   *
+   * @return true if it is available for broadcast.
+   */
+  boolean segmentBroadcastable()


Maybe isSegmentBroadcastTarget()?

…ordinator-for-realtimes

leventov · 2017-05-17T21:57:31Z

@jihoonson please resolve conflicts. Going to merge this PR tomorrow unless somebody else wants to review.

…ordinator-for-realtimes

jihoonson · 2017-05-18T00:20:07Z

@leventov thanks. Resolved conflicts.

leventov · 2017-07-03T23:17:51Z

-    return cluster.values().stream().flatMap(Collection::stream).collect(Collectors.toList());
+    return historicals.values().stream()
+                      .flatMap(Collection::stream)
+                      .collect(() -> realtimes, Set::add, Set::addAll);


Why getAllServers() adds elements to realtimes collection?

Ah it's a bug. I raised a pr #4500.

Thanks for catching it!

jihoonson added 4 commits April 4, 2017 17:56

Add ServerType

f4144e7

Add realtimes to DruidCluster

ce796e3

fix test fails

430021c

Add SegmentManager

6a220e1

fjy added the Feature label Apr 5, 2017

fjy added this to the 0.10.1 milestone Apr 5, 2017

leventov requested changes Apr 28, 2017

View reviewed changes

jihoonson added 5 commits May 4, 2017 18:23

Merge branch 'master' of https://github.com/druid-io/druid into zk-co…

b386da6

…ordinator-for-realtimes

Fix equals and hashCode of ServerHolder

232a2f0

Merge branch 'master' of https://github.com/druid-io/druid into zk-co…

1fc3f72

…ordinator-for-realtimes

Address comments and add more tests

cfd61fc

Address comments

2ef88de

gianm removed this from the 0.10.1 milestone May 16, 2017

leventov approved these changes May 16, 2017

View reviewed changes

Merge branch 'master' of https://github.com/druid-io/druid into zk-co…

b682064

…ordinator-for-realtimes

Merge branch 'master' of https://github.com/druid-io/druid into zk-co…

6d581db

…ordinator-for-realtimes

leventov merged commit 5c0a7ad into apache:master May 18, 2017

leventov mentioned this pull request May 18, 2017

Improve concurrency of SegmentManager #4294

Closed

jon-wei mentioned this pull request May 18, 2017

Add a ServerType for peons #4295

Merged

gianm added this to the 0.10.1 milestone May 18, 2017

leventov reviewed Jul 3, 2017

View reviewed changes

Conversation

jihoonson commented Apr 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jihoonson May 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leventov May 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leventov commented May 17, 2017

Uh oh!

jihoonson commented May 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

jihoonson commented Apr 5, 2017 •

edited

Loading

jihoonson May 16, 2017 •

edited

Loading

leventov May 16, 2017 •

edited

Loading