Skip to content

Fix record validation in SeekableStreamIndexTaskRunner#7246

Merged
fjy merged 2 commits intoapache:masterfrom
jihoonson:fix-seekable-stream
Mar 13, 2019
Merged

Fix record validation in SeekableStreamIndexTaskRunner#7246
fjy merged 2 commits intoapache:masterfrom
jihoonson:fix-seekable-stream

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

Fix #7239.

In verifyInitialRecordAndSkipExclusivePartition(), currOffset is now used to verify the record offset. I also fixed a bug which uses the default compareTo method of SequenceOffsetType. OrderedSequenceNumber should be used instead. I also changed SequenceOffsetType to not extend Comparable to prevent this problem in the future.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@jihoonson
Copy link
Copy Markdown
Contributor Author

Oops, I noticed that the unit test for kinesis indexing is missing. I'll add it.

@jihoonson
Copy link
Copy Markdown
Contributor Author

Added a test for Kinesis index task.

@fjy fjy merged commit 32e86ea into apache:master Mar 13, 2019
// check exclusive starting sequence
if (isStartingSequenceOffsetsExclusive() && exclusiveStartingPartitions.contains(record.getPartitionId())) {
log.info("Skipping starting sequenceNumber for partition [%s] marked exclusive", record.getPartitionId());
log.warn("Skipping starting sequenceNumber for partition [%s] marked exclusive", record.getPartitionId());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be a warning. It looks like it happens by design in Kinesis for any task after the first one that first reads a particular partition.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't remember why I changed this.. Will revert.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

// Check only for the first record among the record batch.
if (initialOffsetsSnapshot.contains(record.getPartitionId())) {
final SequenceOffsetType currOffset = currOffsets.get(record.getPartitionId());
if (currOffset != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is currOffset null? It seems to defeat the purpose of this check if we can get a record to check, and then don't check it because we don't know what the current offset is supposed to be.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe it's better to throw an error if it's null. Will raise a PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

)
{
if (intialSequenceSnapshot.containsKey(record.getPartitionId())) {
if (record.getSequenceNumber().compareTo(intialSequenceSnapshot.get(record.getPartitionId())) < 0) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the issue with using intialSequenceSnapshot in the original code? Did it have the wrong offsets for some reason (like, later offsets than we should be reading)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking against intialSequenceSnapshot is wrong. Before this PR, intialSequenceSnapshot contained the start offsets of the current sequence. Comparing the offsets of the read record with intialSequenceSnapshot means that it would allow rewinding if the rewound offsets are still larger than intialSequenceSnapshot which I don't think it should be allowed.

The bug reported in #7239 happens while checkpointing with multiple replicas. During the checkpoint, the supervisor pauses all replica tasks and finds the max offsets of the current sequence, S. And then, it sets the max offsets to end offsets for all replicas. Here, if finish = false in setEndOffsets(), intialSequenceSnapshot was updated to the given end offsets which is the start offsets of the next sequence, S'. However, each replica can still consume some more offsets of the sequence S after being resumed until it reaches to the end offsets of S. This incurred an exception at here because the offset of the record is for the sequence S which should be smaller than start offsets of S'.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, if finish = false in setEndOffsets(), intialSequenceSnapshot was updated to the given end offsets which is the start offsets of the next sequence, S'. However, each replica can still consume some more offsets of the sequence S after being resumed until it reaches to the end offsets of S. This incurred an exception at here because the offset of the record is for the sequence S which should be smaller than start offsets of S'.

It sounds like this part is the heart of the bug: the code didn't allow for continuing to read a few more messages of a prior sequence S before the messages for a new sequence S' started showing up. And it sounds like the fix is to compare against the currOffsets we think we should be reading right now, rather than the start of the sequence. Thanks for explaining.

gianm pushed a commit to implydata/druid-public that referenced this pull request Mar 14, 2019
* Fix record validation in SeekableStreamIndexTaskRunner

* add kinesis test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kafka tasks fail after resuming for incremental handoff

4 participants