fix SequenceMetadata deserialization#7256
Conversation
What don't you love about it?
I haven't read the patch yet (will soon), but, I don't necessarily think that this kind of change is ugly! The new structure might even be better. The SequenceMetadata in master looks at first like a simple state class, but it actually has methods that modify the runner it came from, which isn't intuitive to me. Making it a non-inner class and explicitly passing in the runner could make that clearer.
Is this related to #7252 or a separate thing you fixed opportunistically? |
| if (isEndSequenceOffsetsExclusive() && | ||
| createSequenceNumber(record.getSequenceNumber()).compareTo( | ||
| createSequenceNumber(endOffsets.get(record.getPartitionId()))) >= 0) { | ||
| stillReading = false; |
There was a problem hiding this comment.
I guess this is the change to fix the bug of stuck on resume. But, it looks to be better to fix the assignPartitions method which is the method to check how many offsets are remaining per partition. This method is also called before starting the read loop.
There was a problem hiding this comment.
Yes, thanks, will update 👍
| if (isEndSequenceOffsetsExclusive() && | ||
| createSequenceNumber(record.getSequenceNumber()).compareTo( | ||
| createSequenceNumber(endOffsets.get(record.getPartitionId()))) >= 0) { | ||
| stillReading = false; |
There was a problem hiding this comment.
stillReading should only be set to false when our assignment is empty (no partitions left to read). Hitting the end offset for one partition doesn't mean we should stop reading all partitions.
There was a problem hiding this comment.
I tried removing this line and the tests all still passed. Did you have a test that failed without this line being here?
There was a problem hiding this comment.
Oops, yes I have a test case to replicate this, I ended up modifying the test to not hit the condition anymore before I determined the issue, will add the test back. This was an attempt to fix opportunistically after choosing a test case that happened to hit the issue.
There was a problem hiding this comment.
I'm going to split this out into a separate PR.
|
Thanks for review @gianm and @jihoonson, will open a follow up PR with the fix for restoring a task that only needs to publish and is already at it's end offset.
I don't especially like the abstract method that returns a |
Ah. I think the TypeReference based thing you did is pretty reasonable compared to the alternatives. TaskAction does something similar. So that makes me feel better :) |
* wip * fix tests, stop reading if we are at end offset * fix build * remove restore at end offsets fix in favor of a separate PR * use typereference from method for serialization too
* wip * fix tests, stop reading if we are at end offset * fix build * remove restore at end offsets fix in favor of a separate PR * use typereference from method for serialization too
Fixes #7252.
I don't love this fix, am totally open to other ideas. This PR adds an abstract method to
SeekableStreamIndexTaskRunnerso that sub classes can create the correct
TypeReferenceto allow deserialization ofSequenceMetadataduring task restore to function. This also meansSequenceMetadatahas been pulled out ofSeekableStreamIndexTaskRunnerand given the same generic parameters ofPartitionIdTypeandSequenceOffsetType. This was sort of ugly becauseSequenceMetadatawas calling methods on it's parentSeekableStreamIndexTaskRunner, so those methods now take a runner as an argument.Also fixed is an issue where a resumed task that was at the end offset would not correctly end the task, resulting in what was afaict a task stuck in it's read loop forever.