Coordinator: Allow dropping all segments.#7447
Conversation
Removes the coordinator sanity check that prevents it from dropping all segments. It's useful to get rid of this, since the behavior is unintuitive for dev/testing clusters where users might regularly want to drop all their data to get back to a clean slate. But the sanity check was there for a reason: to prevent a race condition where the coordinator might drop all segments if it ran before the first metadata store poll finished. This patch addresses that concern differently, by allowing methods in MetadataSegmentManager to return null if a poll has not happened yet, and canceling coordinator runs in that case. This patch also makes the "dataSources" reference in SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile before, but it seems necessary to me: it's not final, and it's dereferenced from multiple threads without synchronization.
clintropolis
left a comment
There was a problem hiding this comment.
I think this seems reasonable 👍
I poked around and it didn't seem like there would be any way to end up with an illegitimate empty list of segments once polling has started...
|
|
||
| final Iterable<DataSegment> dataSegments = coordinator.iterateAvailableDataSegments(); | ||
| if (dataSegments == null) { | ||
| log.info("Metadata store not polled yet, canceling this run."); |
There was a problem hiding this comment.
I think maybe using 'delay' instead of 'cancel', maybe something like "delaying segment coordination" or something of that sort would read better in logs.
|
I think this PR should had a |
| ); | ||
|
|
||
| dataSources.remove(dataSource); | ||
| Optional.ofNullable(dataSources).ifPresent(m -> m.remove(dataSource)); |
There was a problem hiding this comment.
Why do you use Optional.ofNullable(dataSources).ifPresent() instead of
if (dataSources != null) {
...
}in this PR?
There was a problem hiding this comment.
It was because dataSources can become null after being non-null, if stop() is called. Since stop() could be called at any time, dataSources should only be dereferenced one time per method.
There was a problem hiding this comment.
I'll add a comment about this - the variable looks at first glance like a lazy-initialization, but it's actually something that can transition back and forth between null and nonnull, so it needs to be handled differently.
| .stream() | ||
| .map(DruidDataSource::toImmutableDruidDataSource) | ||
| .collect(Collectors.toList()); | ||
| return Optional.ofNullable(dataSources) |
There was a problem hiding this comment.
Why not just
if (dataSources != null) {
return dataSources.values().stream().map(...).collect(...);
} else {
return null;
}There was a problem hiding this comment.
Just to avoid reading the dataSources reference twice in the same method. (Same reason as https://github.com/apache/incubator-druid/pull/7447/files/39dcd326be350ca6b66e4de884708cf77413c166#r274563782)
| ImmutableSet.copyOf(manager.getDataSource("wikipedia").getSegments()) | ||
| ); | ||
| Assert.assertEquals( | ||
| ImmutableSet.of(segment1, segment2), |
There was a problem hiding this comment.
Oh yeah, I should fix that. Sorry.
| // also filled atomically, so if there are any segments at all, we should have all of them.) | ||
| // | ||
| // Note that if the metadata store has not been polled yet, "getAvailableSegments" would throw an error since | ||
| // "availableSegments" is null. But this won't happen, since the earlier helper "DruidCoordinatorSegmentInfoLoader" |
There was a problem hiding this comment.
IMO it's better to identify symbols in comments the following ways instead of putting them into double quotes:
- Adding
()to the end of method names - Class names start with a capital and have CamelCase, so they don't need any extra identification. Same about variable names with camelCase.
- Only single-word variable names may need to be identified, but IMO better to use backticks (`) instead of double quotes.
| private final SQLMetadataConnector connector; | ||
|
|
||
| private ConcurrentHashMap<String, DruidDataSource> dataSources = new ConcurrentHashMap<>(); | ||
| // Volatile since this reference is reassigned in "poll" and then read from in other threads. |
There was a problem hiding this comment.
This comment doesn't explain why does the field need to be volatile. The underlying reason is that the field is effectively a lazily initialized field, and the absence of volatile may lead to NPE unless the rest of the code always reads the field to local variables before using, that is too much of a burden for developers: https://github.com/code-review-checklists/java-concurrency#safe-local-dcl
(Actually, as you translated all code to monadic Optional use of dataSources with a single read, it does not need to be volatile, but I would say that those monadic Optional chains are worse than simple if-else.)
There was a problem hiding this comment.
Ok, because of this: #7447 (comment) the previous comment is irrelevant, there is actually no reason why the field should be volatile in the current version of the code.
There was a problem hiding this comment.
Are you saying it is fine to have a field that is written from one thread, and read from another, with no synchronization or volatile marker, as long as each reader reads it into a local variable first? My understanding of the JMM is that in this case there's no happens-before relationship established, and all bets are off - readers have no guarantees around ever reading anything nonnull (although in practice they probably will, but that's not something you'd want to depend on).
There was a problem hiding this comment.
Practically, as you noted, it doesn't matter (on x86 platform which Druid targets). Formally, volatile is still not enough to ensure "ever reading non-null" before Java 9 where it was formalized in this document.
|
The design with I think that upon losing leadership Coordinator should just stop polling database, but still offering the last view of the segments. |
|
@leventov, sorry, what question do you mean? Is it whether or not we should stop polling the DB after losing leadership? (#7447 (comment)) I can only think of one benefit of nullifying the metadata segment cache when losing leadership: it means that next time we gain leadership, we're guaranteed that the segment cache we use is at least as new as the gain of leadership. If we might use an old one, there's a potential for a new leader to use an older view of segments than the old leader. It could be extreme: maybe the new leader, for some reason, hasn't been able to poll for hours or even days, leading to surprising behavior as the cluster 'rolls back' to an earlier state. This could be mitigated through some code that explicitly makes sure the currently-cached segment metadata is at least as new as the leadership gain, though. I think if you decide it's best to stop nullifying the cache, it'd be good to also add this safety mechanism. |
Removes the coordinator sanity check that prevents it from dropping all segments. It's useful to get rid of this, since the behavior is unintuitive for dev/testing clusters where users might regularly want to drop all their data to get back to a clean slate. But the sanity check was there for a reason: to prevent a race condition where the coordinator might drop all segments if it ran before the first metadata store poll finished. This patch addresses that concern differently, by allowing methods in MetadataSegmentManager to return null if a poll has not happened yet, and canceling coordinator runs in that case. This patch also makes the "dataSources" reference in SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile before, but it seems necessary to me: it's not final, and it's dereferenced from multiple threads without synchronization.
Removes the coordinator sanity check that prevents it from dropping all
segments. It's useful to get rid of this, since the behavior is
unintuitive for dev/testing clusters where users might regularly want
to drop all their data to get back to a clean slate.
But the sanity check was there for a reason: to prevent a race condition
where the coordinator might drop all segments if it ran before the
first metadata store poll finished. This patch addresses that concern
differently, by allowing methods in MetadataSegmentManager to return
null if a poll has not happened yet, and canceling coordinator runs
in that case.
This patch also makes the "dataSources" reference in
SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile
before, but it seems necessary to me: it's not final, and it's dereferenced
from multiple threads without synchronization.