KAFKA-8816: Make offsets immutable to users of RecordCollector.offsets#7223
KAFKA-8816: Make offsets immutable to users of RecordCollector.offsets#7223guozhangwang merged 4 commits intoapache:trunkfrom
Conversation
|
This should have the "streams" label, but I don't seem to have enough permissions to set it. |
mjsax
left a comment
There was a problem hiding this comment.
While I agree in general, that RecordCollectorImpl#offsets is conceptually a private field, I am not sure if I understand why there is a bug. RecordCollectorImpl never reads offsets but only blindly updates the map on write in the send() callback. Hence, this PR does not seem toy change the end-of-end behavior? Can you elaborate?
There was a problem hiding this comment.
Would it be simpler to hand out a deep copy of the map directly?
There was a problem hiding this comment.
That would be totally fine for the current usage pattern, where we have a single query that ends up copying the map anyway. However, if we ever ended up with other queries then those queries would pay the cost of the copy whether they need it or not. It would be a bit surprising that a copy is happening without looking at the implementation. My bias would be towards defensive coding without surprise performance impact.
There was a problem hiding this comment.
Yep, copy-paste error. This can be removed.
There was a problem hiding this comment.
nit: use assertThrows instead of try-fail-catch construct.
There was a problem hiding this comment.
Thanks! It's been a while since I've coded Java and I forgot about assertThrows.
There was a problem hiding this comment.
Will do for both instances.
cpettitt-confluent
left a comment
There was a problem hiding this comment.
Will follow up with a patch to address your comments in test. If you feel strongly about copying the map in offsets let me know and I will make that change; otherwise I'll leave that as is.
There was a problem hiding this comment.
Yep, copy-paste error. This can be removed.
There was a problem hiding this comment.
Will do for both instances.
There was a problem hiding this comment.
Thanks! It's been a while since I've coded Java and I forgot about assertThrows.
There was a problem hiding this comment.
That would be totally fine for the current usage pattern, where we have a single query that ends up copying the map anyway. However, if we ever ended up with other queries then those queries would pay the cost of the copy whether they need it or not. It would be a bit surprising that a copy is happening without looking at the implementation. My bias would be towards defensive coding without surprise performance impact.
|
Re. why is this a bug, based on my understanding of the code: Per the doc on RecordCollector the contract is that the value returned from offsets is the latest acks from the producer. Prior to this change it was possible to get the offsets map and change it directly, with no involvement of a producer. That seems to be a violation of the documentation and my understanding of the intent of the class. Anybody can get the offsets map, do some manipulation to it for its own purpose and accidentally change the internal state of RecordCollector in a non-obvious way. In fact, that is what is happening from StreamTask during commit prior to this patch. |
|
BTW, thanks for the review @mjsax and for reminding me about assertThrows. I'll get a new patch up tomorrow. |
|
New commit is up and ready for review. I couldn't tell if you prefer force push or squash at merge and I see both in other PRs. I opted for squash at merge, but if you prefer force push let me know and I will make it so :). |
|
W.r.t observable behavior change, I reran with this patch and without and here is a difference in how checkpointing works: Good: Bad: In the bad case the consumed offset is not checkpointed because we added the first value we saw to the record collector and never update it (putIfAbsent): I have no idea if that manifests in bad behavior, but it doesn't look right and doesn't match the behavior in the non-optimized graph where the checkpoints increase for the changelog. |
|
Thanks for the details.
It does not sound like a correctness issue, but a performance issue. If the checkpoint does not advance, on restore KS would re-read/re-play more data than necessary from the changelog. |
mjsax
left a comment
There was a problem hiding this comment.
LGTM.
Call for second review @guozhangwang @bbejeck @vvcephei @ableegoldman @cadonna @abbccdda
|
MIght be worth to back-port this fix to older versions... Maybe back to |
|
Java 11 / 2.12 and 2.13 passed. Retest this please |
Make offsets immutable to users of RecordCollector.offsets. Fix up an existing case where offsets could be modified in this way. Add a simple test to verify offsets cannot be changed externally.
|
Seems like flaky tests. The first test passed this time and we picked up two connect test failures which were not there previously. retest this please |
cadonna
left a comment
There was a problem hiding this comment.
@cpettitt-confluent, Thank you for the PR.
Here my feedback.
There was a problem hiding this comment.
nit: This last check is not needed, since it verifies functionality of the Map returned by Collections.unmodifiableMap() and not of the code under test.
|
FYI: Openend PR #7253 for a minor refactoring to |
|
Retest this, please |
|
Thanks for the feedback @cadonna! Good call on checking the map a second time. I will requery the offset directly from the collector, which more correctly completes verification. Patch coming shortly. |
|
Ugh. Somehow my origin moved so this is pulling in a lot of irrelevant stuff. Let me see if I can clean it up. |
3971396 to
b44b9af
Compare
|
All better. Retest this please. |
|
|
||
| assertThat(offsets.get(topicPartition), equalTo(2L)); | ||
| assertThrows(UnsupportedOperationException.class, () -> offsets.put(new TopicPartition(topic, 0), 50L)); | ||
|
|
There was a problem hiding this comment.
I would argue for keeping this because the change impacts the external behavior of the class. We're making a strong statement here: you will get an exception if you try to modify the contents of the returned map, you must copy this map if you want to make changes. This also distinguishes from the alternative approach we could have used in which we proactively copy the map for the user and where the user could have made a change to the map while still not impacting the underlying map. Given that this is externally facing and there is doc a couple levels up, I will fix that up.
Happy to discuss further if you strongly disagree.
| assertThat(offsets.get(topicPartition), equalTo(2L)); | ||
| assertThrows(UnsupportedOperationException.class, () -> offsets.put(new TopicPartition(topic, 0), 50L)); | ||
|
|
||
| // Verify that collector offsets were not updated |
There was a problem hiding this comment.
nit: I would remove this comment because the code it comments is clear enough.
There was a problem hiding this comment.
Hah, this was probably for me more than anyone to highlight the subtle difference between this and the previous query. I'm fine pulling it out and since I have a doc change coming, I will make it so :).
|
@guozhangwang Should we cherry-pick this to older branches? I would at least cherry-pick to |
#7223) Make offsets immutable to users of RecordCollector.offsets. Fix up an existing case where offsets could be modified in this way. Add a simple test to verify offsets cannot be changed externally. Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
#7223) Make offsets immutable to users of RecordCollector.offsets. Fix up an existing case where offsets could be modified in this way. Add a simple test to verify offsets cannot be changed externally. Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
#7223) Make offsets immutable to users of RecordCollector.offsets. Fix up an existing case where offsets could be modified in this way. Add a simple test to verify offsets cannot be changed externally. Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
#7223) Make offsets immutable to users of RecordCollector.offsets. Fix up an existing case where offsets could be modified in this way. Add a simple test to verify offsets cannot be changed externally. Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
|
Cherry-picked all the way to |
Make offsets immutable to users of RecordCollector.offsets. Fix up an
existing case where offsets could be modified in this way. Add a simple
test to verify offsets cannot be changed externally.
More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)