Coordinator: Allow dropping all segments. by gianm · Pull Request #7447 · apache/druid

gianm · 2019-04-11T05:28:41Z

Removes the coordinator sanity check that prevents it from dropping all
segments. It's useful to get rid of this, since the behavior is
unintuitive for dev/testing clusters where users might regularly want
to drop all their data to get back to a clean slate.

But the sanity check was there for a reason: to prevent a race condition
where the coordinator might drop all segments if it ran before the
first metadata store poll finished. This patch addresses that concern
differently, by allowing methods in MetadataSegmentManager to return
null if a poll has not happened yet, and canceling coordinator runs
in that case.

This patch also makes the "dataSources" reference in
SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile
before, but it seems necessary to me: it's not final, and it's dereferenced
from multiple threads without synchronization.

Removes the coordinator sanity check that prevents it from dropping all segments. It's useful to get rid of this, since the behavior is unintuitive for dev/testing clusters where users might regularly want to drop all their data to get back to a clean slate. But the sanity check was there for a reason: to prevent a race condition where the coordinator might drop all segments if it ran before the first metadata store poll finished. This patch addresses that concern differently, by allowing methods in MetadataSegmentManager to return null if a poll has not happened yet, and canceling coordinator runs in that case. This patch also makes the "dataSources" reference in SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile before, but it seems necessary to me: it's not final, and it's dereferenced from multiple threads without synchronization.

clintropolis

I think this seems reasonable 👍

I poked around and it didn't seem like there would be any way to end up with an illegitimate empty list of segments once polling has started...

clintropolis · 2019-04-11T06:30:44Z


+    final Iterable<DataSegment> dataSegments = coordinator.iterateAvailableDataSegments();
+    if (dataSegments == null) {
+      log.info("Metadata store not polled yet, canceling this run.");


I think maybe using 'delay' instead of 'cancel', maybe something like "delaying segment coordination" or something of that sort would read better in logs.

leventov · 2019-04-11T17:00:18Z

I think this PR should had a Design Review tag. Such PRs shouldn't be merged 10 hours after opening.

leventov · 2019-04-11T17:01:43Z

      );

-      dataSources.remove(dataSource);
+      Optional.ofNullable(dataSources).ifPresent(m -> m.remove(dataSource));


Why do you use Optional.ofNullable(dataSources).ifPresent() instead of

if (dataSources != null) { ... }

in this PR?

It was because dataSources can become null after being non-null, if stop() is called. Since stop() could be called at any time, dataSources should only be dereferenced one time per method.

I'll add a comment about this - the variable looks at first glance like a lazy-initialization, but it's actually something that can transition back and forth between null and nonnull, so it needs to be handled differently.

Added in #7452.

leventov · 2019-04-11T17:04:48Z

-                      .stream()
-                      .map(DruidDataSource::toImmutableDruidDataSource)
-                      .collect(Collectors.toList());
+    return Optional.ofNullable(dataSources)


Why not just

if (dataSources != null) { return dataSources.values().stream().map(...).collect(...); } else { return null; }

Just to avoid reading the dataSources reference twice in the same method. (Same reason as https://github.com/apache/incubator-druid/pull/7447/files/39dcd326be350ca6b66e4de884708cf77413c166#r274563782)

leventov · 2019-04-11T17:07:13Z

        ImmutableSet.copyOf(manager.getDataSource("wikipedia").getSegments())
    );
+    Assert.assertEquals(
+                        ImmutableSet.of(segment1, segment2),


Improper formatting

Oh yeah, I should fix that. Sorry.

Added in #7452.

leventov · 2019-04-11T17:12:20Z

+    // also filled atomically, so if there are any segments at all, we should have all of them.)
+    //
+    // Note that if the metadata store has not been polled yet, "getAvailableSegments" would throw an error since
+    // "availableSegments" is null. But this won't happen, since the earlier helper "DruidCoordinatorSegmentInfoLoader"


IMO it's better to identify symbols in comments the following ways instead of putting them into double quotes:

Adding () to the end of method names

Class names start with a capital and have CamelCase, so they don't need any extra identification. Same about variable names with camelCase.

Only single-word variable names may need to be identified, but IMO better to use backticks (`) instead of double quotes.

leventov · 2019-04-11T17:16:43Z

  private final SQLMetadataConnector connector;

-  private ConcurrentHashMap<String, DruidDataSource> dataSources = new ConcurrentHashMap<>();
+  // Volatile since this reference is reassigned in "poll" and then read from in other threads.


This comment doesn't explain why does the field need to be volatile. The underlying reason is that the field is effectively a lazily initialized field, and the absence of volatile may lead to NPE unless the rest of the code always reads the field to local variables before using, that is too much of a burden for developers: https://github.com/code-review-checklists/java-concurrency#safe-local-dcl

(Actually, as you translated all code to monadic Optional use of dataSources with a single read, it does not need to be volatile, but I would say that those monadic Optional chains are worse than simple if-else.)

Ok, because of this: #7447 (comment) the previous comment is irrelevant, there is actually no reason why the field should be volatile in the current version of the code.

Are you saying it is fine to have a field that is written from one thread, and read from another, with no synchronization or volatile marker, as long as each reader reads it into a local variable first? My understanding of the JMM is that in this case there's no happens-before relationship established, and all bets are off - readers have no guarantees around ever reading anything nonnull (although in practice they probably will, but that's not something you'd want to depend on).

Practically, as you noted, it doesn't matter (on x86 platform which Druid targets). Formally, volatile is still not enough to ensure "ever reading non-null" before Java 9 where it was formalized in this document.

Follow up to apache#7447.

leventov · 2019-04-11T18:47:56Z

The design with SQLMetadataSegmentManager enforcing leadership changes on its callers doesn't feel right to me. It forces all callers to handle this situation, while it doesn't feel to me that they should. For example, REST endpoints in MetadataResource shouldn't be responsible for this, rather, the user is responsible for querying the right Coordinator (the current leader).

I think that upon losing leadership Coordinator should just stop polling database, but still offering the last view of the segments.

Follow up to #7447.

leventov · 2019-04-16T16:38:28Z

@gianm this issue blocks my progress in #7306, I need to know in which direction to resolve conflicts. Please answer the last question.

gianm · 2019-04-16T18:13:31Z

@leventov, sorry, what question do you mean? Is it whether or not we should stop polling the DB after losing leadership? (#7447 (comment))

I can only think of one benefit of nullifying the metadata segment cache when losing leadership: it means that next time we gain leadership, we're guaranteed that the segment cache we use is at least as new as the gain of leadership. If we might use an old one, there's a potential for a new leader to use an older view of segments than the old leader. It could be extreme: maybe the new leader, for some reason, hasn't been able to poll for hours or even days, leading to surprising behavior as the cluster 'rolls back' to an earlier state.

This could be mitigated through some code that explicitly makes sure the currently-cached segment metadata is at least as new as the leadership gain, though. I think if you decide it's best to stop nullifying the cache, it'd be good to also add this safety mechanism.

Removes the coordinator sanity check that prevents it from dropping all segments. It's useful to get rid of this, since the behavior is unintuitive for dev/testing clusters where users might regularly want to drop all their data to get back to a clean slate. But the sanity check was there for a reason: to prevent a race condition where the coordinator might drop all segments if it ran before the first metadata store poll finished. This patch addresses that concern differently, by allowing methods in MetadataSegmentManager to return null if a poll has not happened yet, and canceling coordinator runs in that case. This patch also makes the "dataSources" reference in SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile before, but it seems necessary to me: it's not final, and it's dereferenced from multiple threads without synchronization.

gianm added the Area - Segment Balancing/Coordination label Apr 11, 2019

clintropolis approved these changes Apr 11, 2019

View reviewed changes

fjy added this to the 0.15.0 milestone Apr 11, 2019

fjy merged commit a517f8c into apache:master Apr 11, 2019

leventov reviewed Apr 11, 2019

View reviewed changes

gianm deleted the coord-clean-down-to-zero branch April 11, 2019 18:09

gianm added a commit to gianm/druid that referenced this pull request Apr 11, 2019

SQLMetadataSegmentManager: Comments, formatting adjustments

fbe2e9a

Follow up to apache#7447.

gianm mentioned this pull request Apr 11, 2019

SQLMetadataSegmentManager: Comments, formatting adjustments #7452

Merged

fjy pushed a commit that referenced this pull request Apr 12, 2019

SQLMetadataSegmentManager: Comments, formatting adjustments (#7452)

3854cfd

Follow up to #7447.

leventov mentioned this pull request May 13, 2019

Refactor SQLMetadataSegmentManager; Change contract of REST methods in DataSourcesResource #7653

Merged

Conversation

gianm commented Apr 11, 2019

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leventov commented Apr 11, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leventov commented Apr 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leventov commented Apr 16, 2019

Uh oh!

gianm commented Apr 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leventov commented Apr 11, 2019 •

edited

Loading