Logic adjustments to SeekableStreamIndexTaskRunner. by gianm · Pull Request #7267 · apache/druid

gianm · 2019-03-14T14:36:43Z

A mix of simplifications and bug fixes. They are intermingled because
some of the bugs were made difficult to fix, and also more likely to
happen in the first place, by how the code was structured. I tried to
keep restructuring to a minimum. The changes are:

Remove "initialOffsetsSnapshot", which was used to determine when to
skip start offsets. Replace it with "lastReadOffsets", which I hope
is more intuitive. (There is a connection: start offsets must be
skipped if and only if they have already been read, either by a
previous task or by a previous sequence in the same task, post-restoring.)
Remove "isStartingSequenceOffsetsExclusive", because it should always
be the opposite of isEndOffsetExclusive. The reason is that starts are
exclusive exactly when the prior ends are inclusive: they must match
up in that way for adjacent reads to link up properly.
Don't call "seekToStartingSequence" after the initial seek. There is
no reason to, since we expect to read continuous message streams
throughout the task. And calling it makes offset-tracking logic
trickier, so better to avoid the need for trickiness. I believe the
call being here was causing a bug in Kinesis ingestion where a
message might get double-read.
Remove the "continue" calls in the main read loop. They are bad
because they prevent keeping currOffsets and lastReadOffsets up to
date, and prevent us from detecting that we have finished reading.
Rework "verifyInitialRecordAndSkipExclusivePartition" into
"verifyRecordInRange". It no longer has side effects. It does a sanity
check on the message offset and also makes sure that it is not past
the endOffsets.
Rework "assignPartitions" to replace inline comparisons with
"isRecordAlreadyRead" and "isMoreToReadBeforeReadingRecord" calls. I
believe this fixes an off-by-one error with Kinesis where the last
record would not get read. It also makes the logic easier to read.
When doing the final publish, only adjust end offsets of the final
sequence, rather than potentially adjusting any unpublished sequence.
Adjusting sequences other than the last one is a mistake since it
will extend their endOffsets beyond what they actually read. (I'm not
sure if this was an issue in practice, since I'm not sure if real
world situations would have more than one unpublished sequence.)
Rename "isEndSequenceOffsetsExclusive" to "isEndOffsetExclusive". It's
shorter and more clear, I think.
Add equals/hashCode/toString methods to OrderedSequenceNumber.

Kafka test changes:

Added a Kafka "testRestoreAtEndOffset" test to verify that restores at
the very end of the task lifecycle still work properly.

Kinesis test changes:

Renamed "testRunOnNothing" to "testRunOnSingletonRange". I think that
given Kinesis semantics, the right behavior when start offset equals
end offset (and there aren't exclusive partitions set) is to read that
single offset. This is because they are both meant to be treated as
inclusive.
Adjusted "testRestoreAfterPersistingSequences" to expect one more
message read. I believe the old test was wrong; it expected the task
not to read message number 5.
Adjusted "testRunContextSequenceAheadOfStartingOffsets" to use a
checkpoint starting from 1 rather than 2. I believe the old test was
wrong here too; it was expecting the task to start reading from the
checkpointed offset, but it actually should have started reading from
one past the checkpointed offset.
Adjusted "testIncrementalHandOffReadsThroughEndOffsets" to expect
11 messages read instead of 12. It's starting at message 0 and reading
up to 10, which should be 11 messages.

gianm · 2019-03-14T14:44:59Z

Most of the bug fixes should only affect Kinesis, since they were in code that handled the possibility of inclusive end offsets, which the Kafka codepath doesn't use. I think the only Kafka-related issue fixed by this patch was the removal of the "continue" calls in the main read loop, which beforehand, could potentially have caused Kafka ingestion to get stuck.

A mix of simplifications and bug fixes. They are intermingled because some of the bugs were made difficult to fix, and also more likely to happen in the first place, by how the code was structured. I tried to keep restructuring to a minimum. The changes are: - Remove "initialOffsetsSnapshot", which was used to determine when to skip start offsets. Replace it with "lastReadOffsets", which I hope is more intuitive. (There is a connection: start offsets must be skipped if and only if they have already been read, either by a previous task or by a previous sequence in the same task, post-restoring.) - Remove "isStartingSequenceOffsetsExclusive", because it should always be the opposite of isEndOffsetExclusive. The reason is that starts are exclusive exactly when the prior ends are inclusive: they must match up in that way for adjacent reads to link up properly. - Don't call "seekToStartingSequence" after the initial seek. There is no reason to, since we expect to read continuous message streams throughout the task. And calling it makes offset-tracking logic trickier, so better to avoid the need for trickiness. I believe the call being here was causing a bug in Kinesis ingestion where a message might get double-read. - Remove the "continue" calls in the main read loop. They are bad because they prevent keeping currOffsets and lastReadOffsets up to date, and prevent us from detecting that we have finished reading. - Rework "verifyInitialRecordAndSkipExclusivePartition" into "verifyRecordInRange". It no longer has side effects. It does a sanity check on the message offset and also makes sure that it is not past the endOffsets. - Rework "assignPartitions" to replace inline comparisons with "isRecordAlreadyRead" and "isMoreToReadBeforeReadingRecord" calls. I believe this fixes an off-by-one error with Kinesis where the last record would not get read. It also makes the logic easier to read. - When doing the final publish, only adjust end offsets of the final sequence, rather than potentially adjusting any unpublished sequence. Adjusting sequences other than the last one is a mistake since it will extend their endOffsets beyond what they actually read. (I'm not sure if this was an issue in practice, since I'm not sure if real world situations would have more than one unpublished sequence.) - Rename "isEndSequenceOffsetsExclusive" to "isEndOffsetExclusive". It's shorter and more clear, I think. - Add equals/hashCode/toString methods to OrderedSequenceNumber. Kafka test changes: - Added a Kafka "testRestoreAtEndOffset" test to verify that restores at the very end of the task lifecycle still work properly. Kinesis test changes: - Renamed "testRunOnNothing" to "testRunOnSingletonRange". I think that given Kinesis semantics, the right behavior when start offset equals end offset (and there aren't exclusive partitions set) is to read that single offset. This is because they are both meant to be treated as inclusive. - Adjusted "testRestoreAfterPersistingSequences" to expect one more message read. I believe the old test was wrong; it expected the task not to read message number 5. - Adjusted "testRunContextSequenceAheadOfStartingOffsets" to use a checkpoint starting from 1 rather than 2. I believe the old test was wrong here too; it was expecting the task to start reading from the checkpointed offset, but it actually should have started reading from one past the checkpointed offset. - Adjusted "testIncrementalHandOffReadsThroughEndOffsets" to expect 11 messages read instead of 12. It's starting at message 0 and reading up to 10, which should be 11 messages.

jihoonson

@gianm thank you for cleaning up! It looks better to read. I left some comments. Please consider reverting topic and offset to stream and sequence, respectively.

jihoonson · 2019-03-14T17:54:17Z

        if (!restoredNextPartitions.getStream().equals(ioConfig.getStartPartitions().getStream())) {
          throw new ISE(
-              "WTF?! Restored stream[%s] but expected stream[%s]",
+              "WTF?! Restored topic[%s] but expected topic[%s]",


IIRC, the term stream was used intentionally in #6431 because the author thought it's a more generic term to represent both Kafka topic and Kinesis stream. This stream is used in other places in Druid too.

OK, I'll revert these changes, but I do think it's better w/ Kafkaesque terminology (I agree w/ #7267 (comment)). Especially because "sequence" already means something else in the context of seekable stream tasks (SequenceMetadata, sequenceName, etc) and so it is best to avoid. But this can be driven separately and doesn't need to be looped into this logic adjustment PR.

jihoonson · 2019-03-14T18:14:51Z

  private final Map<PartitionIdType, SequenceOffsetType> endOffsets;
+
+  // lastReadOffsets are the last offsets that were read and processed.
+  private final ConcurrentMap<PartitionIdType, SequenceOffsetType> lastReadOffsets = new ConcurrentHashMap<>();


Why is this a ConcurrentHashMap?

Good question. There is no reason. I changed it to a regular HashMap.

jihoonson · 2019-03-14T18:16:09Z

  private final List<ListenableFuture<SegmentsAndMetadata>> publishWaitList = new ArrayList<>();
  private final List<ListenableFuture<SegmentsAndMetadata>> handOffWaitList = new ArrayList<>();
-  private final Set<PartitionIdType> initialOffsetsSnapshot = new HashSet<>();
  private final Set<PartitionIdType> exclusiveStartingPartitions = new HashSet<>();


Would you please remove this? It's not used anymore.

Removed, thanks.

jihoonson · 2019-03-14T18:23:19Z

+      final SequenceOffsetType recordOffset
  )
  {
    // Check only for the first record among the record batch.


It looks that this isn't true anymore.

Changed it, thanks.

jihoonson · 2019-03-14T18:25:14Z


            log.trace(
-                "Got stream[%s] partition[%s] sequence[%s].",
+                "Got topic[%s] partition[%s] offset[%s], shouldProcess[%s].",


Same here. stream and sequence were used intentionally.

I personally prefer offset over sequence because the former is more obviously a position to me, but am indifferent about whether stream or topic.

jihoonson · 2019-03-14T18:40:57Z

-          sequenceMetadata.setEndOffsets(currOffsets);
-          sequenceMetadata.updateAssignments(this, currOffsets);
+          final boolean isLast = i == (sequences.size() - 1);
+          if (isLast) {


Does it make sense to add a sanity check that the endOffsets are properly set for non-last sequences?

jihoonson · 2019-03-14T19:22:50Z

+        )
+    );
+
+    final ListenableFuture<TaskStatus> future2 = runTask(task2);


Would you please add a comment about why task2 reads nothing?

The actual bug here was that if a task was given a 'bad' end offset that was a kafka transactional topic control offset instead of a record, and was right after the last read good offset, that the task would get stuck in an infinite read loop due to the continue statements in the loop that were removed in this PR. I think this test should either be removed since it shouldn't happen in practice, or be renamed to like testDoesntGetStuckWithTransactionOffset and maybe slightly reworked and commented to clear this up.

I think this test should probably just be removed, since it's not testing a real scenario.

OK, I removed it.

jihoonson · 2019-03-14T20:01:41Z

-        true
-    );
+    // Set end offsets to one past the checkpoint, simulating a replica that needs to catch up.
+    task.getRunner().setEndOffsets(ImmutableMap.of(shardId1, "10"), true);


FYI, I fixed this test to be more realistic in #7264.

It looks like that PR has enough approvals to commit. I'll do that and merge it into this one.

clintropolis

Overall LGTM; it looks a lot clearer, thanks for doing this refactor 👍

I think we need to nail down the terminology a bit though, there now a bit more of a mix of offset, 'sequence number' and 'sequence offset'. Offset was never totally removed it appears, and SeekableStreamPartitions seems to support both terminology presumably for backwards compatibility, SequenceMetadata is using offset terminology probably for the same reason, I'm not quite sure what else is using what yet.

My vote is for 'offset' though.

clintropolis · 2019-03-14T20:01:55Z


  @Override
-  protected Long getSequenceNumberToStoreAfterRead(@NotNull Long sequenceNumber)
+  protected Long getNextStartOffset(@NotNull Long sequenceNumber)


:+1 on switching to 'offset', i think it's more intuitive terminology, though maybe change parameter variable name too?

I decided to revert this for now, but plan to try again later.

clintropolis · 2019-03-14T20:09:52Z

+        )
+    );
+
+    final ListenableFuture<TaskStatus> future2 = runTask(task2);


The actual bug here was that if a task was given a 'bad' end offset that was a kafka transactional topic control offset instead of a record, and was right after the last read good offset, that the task would get stuck in an infinite read loop due to the continue statements in the loop that were removed in this PR. I think this test should either be removed since it shouldn't happen in practice, or be renamed to like testDoesntGetStuckWithTransactionOffset and maybe slightly reworked and commented to clear this up.

clintropolis · 2019-03-14T20:14:12Z


            log.trace(
-                "Got stream[%s] partition[%s] sequence[%s].",
+                "Got topic[%s] partition[%s] offset[%s], shouldProcess[%s].",


I personally prefer offset over sequence because the former is more obviously a position to me, but am indifferent about whether stream or topic.

jihoonson · 2019-03-14T20:33:08Z

I think the best would be using different terms for kinesis and kafka. They are defining their own terminologies and this would be especially good for logging.

clintropolis · 2019-03-14T20:39:45Z

I think the best would be using different terms for kinesis and kafka. They are defining their own terminologies and this would be especially good for logging.

I'm not sure what you mean... I think that would only be good for logging? I am mostly concerned about addressing what we call stuff in the code in the shared common structure to make it easy to follow and not switching what we call things all the time, and where I find offset to be more intuitive. I guess the implementors of SeekableStreamIndexTaskRunner could supply string labels of what terminology to use for topics and offsets to make the logs label things appropriately?

jihoonson · 2019-03-14T20:43:42Z

I meant, there are people who prefer sequence and stream. I also think offset and topic are better terms for internal usage. But it was decided to use sequence and stream in #6431 because the authors think they are better. If you think we should change it, I think we should discuss it with other people including original authors. I personally prefer to not change in this PR to not block 0.14 release anymore.

clintropolis · 2019-03-14T20:55:58Z

I personally prefer to not change in this PR to not block 0.14 release anymore.

We should ensure that whatever terminology we want to use is correct now since this is being introduced with 0.14, at least things like json that escapes the source code, else we are going to have a bad time later. I agree that it's not worth blocking over renaming variables.

fjy · 2019-03-15T01:38:28Z

I don't think its worth blocking a release over variable names.

clintropolis · 2019-03-15T02:18:25Z

I don't think its worth blocking a release over variable names.

I agree, I wasn't thinking about variable names, I was talking about making sure we are happy with the things that end up in json that will be hard to change later once this is in the wild. I think it is maybe ok from what I've looked through so far.

jon-wei · 2019-03-15T02:23:45Z

If we had to pick one set of terms, I would personally lean towards using Kafka-based terminology like "topic" and "offset" since I view Kafka as more "archetypal" than Kinesis.

This PR doesn't change spec properties, so I think it's fine in that respect.

gianm · 2019-03-15T05:02:17Z

I reverted the offset/topic naming changes. However, I also changed sequence[%s] to sequenceNumber[%s] where it refers to a sequenceNumber/offset because I think calling that thing a "sequence" is not right, since it's not a sequence; it's a number in a sequence, or an offset.

jihoonson

LGTM. It looks much easier to read. Thanks!

clintropolis

LGTM 👍

* Logic adjustments to SeekableStreamIndexTaskRunner. A mix of simplifications and bug fixes. They are intermingled because some of the bugs were made difficult to fix, and also more likely to happen in the first place, by how the code was structured. I tried to keep restructuring to a minimum. The changes are: - Remove "initialOffsetsSnapshot", which was used to determine when to skip start offsets. Replace it with "lastReadOffsets", which I hope is more intuitive. (There is a connection: start offsets must be skipped if and only if they have already been read, either by a previous task or by a previous sequence in the same task, post-restoring.) - Remove "isStartingSequenceOffsetsExclusive", because it should always be the opposite of isEndOffsetExclusive. The reason is that starts are exclusive exactly when the prior ends are inclusive: they must match up in that way for adjacent reads to link up properly. - Don't call "seekToStartingSequence" after the initial seek. There is no reason to, since we expect to read continuous message streams throughout the task. And calling it makes offset-tracking logic trickier, so better to avoid the need for trickiness. I believe the call being here was causing a bug in Kinesis ingestion where a message might get double-read. - Remove the "continue" calls in the main read loop. They are bad because they prevent keeping currOffsets and lastReadOffsets up to date, and prevent us from detecting that we have finished reading. - Rework "verifyInitialRecordAndSkipExclusivePartition" into "verifyRecordInRange". It no longer has side effects. It does a sanity check on the message offset and also makes sure that it is not past the endOffsets. - Rework "assignPartitions" to replace inline comparisons with "isRecordAlreadyRead" and "isMoreToReadBeforeReadingRecord" calls. I believe this fixes an off-by-one error with Kinesis where the last record would not get read. It also makes the logic easier to read. - When doing the final publish, only adjust end offsets of the final sequence, rather than potentially adjusting any unpublished sequence. Adjusting sequences other than the last one is a mistake since it will extend their endOffsets beyond what they actually read. (I'm not sure if this was an issue in practice, since I'm not sure if real world situations would have more than one unpublished sequence.) - Rename "isEndSequenceOffsetsExclusive" to "isEndOffsetExclusive". It's shorter and more clear, I think. - Add equals/hashCode/toString methods to OrderedSequenceNumber. Kafka test changes: - Added a Kafka "testRestoreAtEndOffset" test to verify that restores at the very end of the task lifecycle still work properly. Kinesis test changes: - Renamed "testRunOnNothing" to "testRunOnSingletonRange". I think that given Kinesis semantics, the right behavior when start offset equals end offset (and there aren't exclusive partitions set) is to read that single offset. This is because they are both meant to be treated as inclusive. - Adjusted "testRestoreAfterPersistingSequences" to expect one more message read. I believe the old test was wrong; it expected the task not to read message number 5. - Adjusted "testRunContextSequenceAheadOfStartingOffsets" to use a checkpoint starting from 1 rather than 2. I believe the old test was wrong here too; it was expecting the task to start reading from the checkpointed offset, but it actually should have started reading from one past the checkpointed offset. - Adjusted "testIncrementalHandOffReadsThroughEndOffsets" to expect 11 messages read instead of 12. It's starting at message 0 and reading up to 10, which should be 11 messages. * Changes from code review.

gianm added Bug Area - Streaming Ingestion labels Mar 14, 2019

gianm added this to the 0.14.0 milestone Mar 14, 2019

gianm force-pushed the fix-ss-stuff branch from f59c7d2 to 7bade85 Compare March 14, 2019 14:46

jihoonson reviewed Mar 14, 2019

View reviewed changes

clintropolis requested changes Mar 14, 2019

View reviewed changes

gianm added 2 commits March 14, 2019 21:03

Merge branch 'master' into fix-ss-stuff

dcb376a

Changes from code review.

972eacc

jihoonson approved these changes Mar 15, 2019

View reviewed changes

clintropolis approved these changes Mar 15, 2019

View reviewed changes

clintropolis merged commit a8c7132 into apache:master Mar 15, 2019

gianm deleted the fix-ss-stuff branch March 15, 2019 13:44

This was referenced Mar 15, 2019

[Backport] Logic adjustments to SeekableStreamIndexTaskRunner. #7271

Closed

[Backport] Logic adjustments to SeekableStreamIndexTaskRunner. #7272

Merged

Conversation

gianm commented Mar 14, 2019

Uh oh!

gianm commented Mar 14, 2019

Uh oh!

jihoonson left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm Mar 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jihoonson commented Mar 14, 2019

Uh oh!

clintropolis commented Mar 14, 2019

Uh oh!

jihoonson commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clintropolis commented Mar 14, 2019

Uh oh!

fjy commented Mar 15, 2019

Uh oh!

clintropolis commented Mar 15, 2019

Uh oh!

jon-wei commented Mar 15, 2019

Uh oh!

gianm commented Mar 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jihoonson left a comment

Choose a reason for hiding this comment

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

gianm Mar 15, 2019 •

edited

Loading

jihoonson commented Mar 14, 2019 •

edited

Loading

gianm commented Mar 15, 2019 •

edited

Loading