KAFKA-13431: Expose the original pre-transform topic partition and offset in sink records#14024
Conversation
There was a problem hiding this comment.
This change is binary compatible. Source / API compatibility is a bit trickier here however. The existing public constructors for SinkRecord have been modified to simply use topic, kafkaPartition, and kafkaOffset for originalTopic, originalKafkaPartition, and originalKafkaOffset respectively (so records created outside the Connect framework that were equal earlier will continue being equal and vice versa). The new constructor which includes the original topic, partition, and offset is now being used by the framework. This means that if there are 2 records with the same post-transform topic, partition, and offset but different pre-transform topic, partition or offset they will now be considered as unequal.
There was a problem hiding this comment.
Thanks for the detailed analysis!
Anticipating some possible use cases that may be impacted by this, I've come up with:
- Equality testing in unit tests
- Storing sink records in data structures (e.g., elements in a
HashSetor keys in aHashMap) which contain elements that are unique according to theirequalsmethod
With 1, this change is unlikely to be controversial.
With 2, things are a little trickier, but ultimately I think it's best to proceed with this change. There's enough granularity in the existing equals method that I can't anticipate any realistic cases where that method would previously return true but would now return false. For example, if a connector is tracking records that were redelivered to it via Sink::put after throwing an exception in SinkTask::preCommit, this change won't affect that case since the re-delivered records would have the same original TPO.
Also, FWIW, there's precedent here with when we added headers to Connect in KIP-145; there was no discussion on the PR regarding the impact that changes to these methods may have on compatibility.
There was a problem hiding this comment.
The connector can receive two SinkRecords which may pass equality if they have the same T/P/O, and the same contents, header, etc. Right now this can happen when an SMT causes a T/P/O collision, and if a record is re-delivered to the connector. After this change, only a re-delivery can cause this equality check to fire.
I don't think it would be reasonable for someone to rely on this to detect T/P/O collisions, because it is inherently unreliable due to checking equality of the contents of the record. I think the semantics for only being equal after a re-delivery is much better.
We don't seem to rely on this equals method in the runtime, so this should only affect connector implementations.
There was a problem hiding this comment.
Similar to the above change, this is binary compatible too. Although this one could potentially break some oddball downstream use cases like tests asserting exact hash code values for sink records. While this isn't strictly source / API compatible, it should be fairly unlikely that such use cases exist in the wild. If this is still deemed to be a concern (or more cases crop up), this change can be removed without affecting the rest of this PR / KIP.
gharris1727
left a comment
There was a problem hiding this comment.
Thanks Yash!
I had some minor questions.
There was a problem hiding this comment.
The connector can receive two SinkRecords which may pass equality if they have the same T/P/O, and the same contents, header, etc. Right now this can happen when an SMT causes a T/P/O collision, and if a record is re-delivered to the connector. After this change, only a re-delivery can cause this equality check to fire.
I don't think it would be reasonable for someone to rely on this to detect T/P/O collisions, because it is inherently unreliable due to checking equality of the contents of the record. I think the semantics for only being equal after a re-delivery is much better.
We don't seem to rely on this equals method in the runtime, so this should only affect connector implementations.
C0urante
left a comment
There was a problem hiding this comment.
Thanks Yash! Lot of nits but overall this is looking great.
There was a problem hiding this comment.
Thanks for the detailed analysis!
Anticipating some possible use cases that may be impacted by this, I've come up with:
- Equality testing in unit tests
- Storing sink records in data structures (e.g., elements in a
HashSetor keys in aHashMap) which contain elements that are unique according to theirequalsmethod
With 1, this change is unlikely to be controversial.
With 2, things are a little trickier, but ultimately I think it's best to proceed with this change. There's enough granularity in the existing equals method that I can't anticipate any realistic cases where that method would previously return true but would now return false. For example, if a connector is tracking records that were redelivered to it via Sink::put after throwing an exception in SinkTask::preCommit, this change won't affect that case since the re-delivered records would have the same original TPO.
Also, FWIW, there's precedent here with when we added headers to Connect in KIP-145; there was no discussion on the PR regarding the impact that changes to these methods may have on compatibility.
yashmayya
left a comment
There was a problem hiding this comment.
Thanks for the review folks!
|
@gharris1727 did you want to give this another pass before we merge? No worries if you don't have the bandwidth. |
gharris1727
left a comment
There was a problem hiding this comment.
LGTM! Thanks @yashmayya for fixing this gap in the API!
|
Jenkins is failing because it's merging a commit on trunk that doesn't include #14037 before testing, which causes the I've pushed an empty commit to retrigger CI; hopefully that'll do the trick and pick up the latest changes from trunk. |
|
@yashmayya Mind patching the merge conflicts introduced by #14044? Thanks! |
…fset in sink records
Co-authored-by: Chris Egerton <fearthecellos@gmail.com>
…oc to new SinkRecord constructor
…l constructor; add unit test
5d8026c to
4ef3a9e
Compare
|
Thanks Chris, I've rebased this on the latest |
|
Jenkins build is pretty unstable but this appears due to unrelated issues. We have a successful build (minus a few flaky tests) on two out of four nodes in the latest run. Merging... |
…ion and offset in sink records (apache#14024) Reviewers: Greg Harris <greg.harris@aiven.io>, Chris Egerton <chrise@aiven.io>
…ion and offset in sink records (apache#14024) Reviewers: Greg Harris <greg.harris@aiven.io>, Chris Egerton <chrise@aiven.io>
Committer Checklist (excluded from commit message)