Conversation
85ed1f3 to
5c94c1e
Compare
There was a problem hiding this comment.
Add this as dummy because I reference it in JavaDocs of AbstractSegments -- will be implemented in follow up PR.
There was a problem hiding this comment.
Not a big deal, but I'm not a huge fan of dead code (here or below). IMHO, it would have been better to avoid adding the reference in the javadoc and thereby avoid the obligation to add both of these un-implemented, un-tested, and un-used classes.
No change requested, just sharing my thoughts ;) I know you're queuing up the implementations in another PR right away.
There was a problem hiding this comment.
While I agree, I did it this way, because otherwise, updating the Javadoc would have most likely slipped...
There was a problem hiding this comment.
Introduce generic Segment type so we can use it for both segment types
There was a problem hiding this comment.
This must be implement in derived classed now
There was a problem hiding this comment.
This code is copied from old class Segment
There was a problem hiding this comment.
This replaces old class Segments
There was a problem hiding this comment.
Now an interface -- actual code moved to KeyValueSegment
There was a problem hiding this comment.
Add this as dummy because I reference it in JavaDocs of AbstractSegments -- will be implemented in follow up PR.
|
Call for review @guozhangwang @bbejeck @vvcephei @ableegoldman |
There was a problem hiding this comment.
nit: I'm wondering if this would be easier to read
if(segments.put(segmentId, newSegment)!= null){
throw new IllegalStateException("KeyValueSegment already exists. Possible concurrent access.");
}However, this is highly opinated
5c94c1e to
e336c2d
Compare
|
Updated per comment |
There was a problem hiding this comment.
Thanks to the flat package structure, this change actually exposes these fields to 66 other classes, all of our other internal state-related code. This is actually pretty risky, since segments in particular is mutable.
Your intent is to broaden the scope to only the subclasses, but Java has no visibility modifier to do that. (protected seems to be the one, but its visibility is actually broader than package-protected)
To enforce the desired visibility restriction, we would have to create a new package, org.apache.kafka.streams.state.internals.segments, and locate the abstract class and both implementations in it. Then the fields would be visible only to the implementing classes, as desired.
Personally, I think the mutable segments field justifies this, as future internal state-store code changes might accidentally think it's ok to touch segments, directly, leading to subtle bugs.
There was a problem hiding this comment.
I don't see this danger. Note, that all changes must be approved by a committer and thus, I am sure this would be caught (it's not a project for which somebody will limited knowledge can easily push changes). If you insist on a different package structure I am also fine with it, even if I don't see a bid advantage introducing many small packages.
\cc @guozhangwang
There was a problem hiding this comment.
In general, there may not be a big advantage to many small packages, but in this particular case, it happens to be the only way to safely protect what is intended to be private state.
Personally, I wouldn't lean too heavily on the eternal vigilence of reviewers to catch subtle bugs when there's an easy way to prevent them structurally. I certainly wouldn't trust myself to just remember that this field, although it's made accessible to the whole package, is actually supposed to be used only by subclasses. In two years' time, I think I would just assume it's supposed to be accessible, since that's what the code says.
Also curious what @guozhangwang and @bbejeck think.
There was a problem hiding this comment.
Ah, I see you already added the interface. This is fine, too!
There was a problem hiding this comment.
The current hierarchy looks good to me.
There was a problem hiding this comment.
Not a big deal, but I'm not a huge fan of dead code (here or below). IMHO, it would have been better to avoid adding the reference in the javadoc and thereby avoid the obligation to add both of these un-implemented, un-tested, and un-used classes.
No change requested, just sharing my thoughts ;) I know you're queuing up the implementations in another PR right away.
There was a problem hiding this comment.
It's unusual to hold a reference to an abstract class like this. I believe the intent is to be able to transparently handle either KeyValueSegments or (I'm guessing) KeyValueTimestampSegments.
The full expression would be to have a Segments interface implemented by AbstractSegments, which is then extended by your two implementations. Then this line would reference Segments<S>. It's fine to collapse this into just the abstract class (although questionable in the presence of package-protected fields). But to maintain transparency, I'd name the abstract class Segments instead of AbstractSegments. That way, to an outsider class (like this one), you're still just interacting with the interface (i.e., the public interface of the class), rather than specifically an abstract class.
Adhering to this pattern leaves the door open in the future to extract Segments into a full interface without having to change any outside code (which is what I meant by maintain transparency).
There was a problem hiding this comment.
It's unusual to hold a reference to an abstract class like this. I believe the intent is to be able to transparently handle either
KeyValueSegmentsor (I'm guessing)KeyValueTimestampSegments.
Correct.
I don't see a big improvement adding an interface, but if it makes you happy :)
There was a problem hiding this comment.
I agree, and it might by splitting hairs, but I actually was only suggesting to rename AbstractSegments to Segments. Although some projects require all references to be interfaces, it generally leads to low-value bloat. I think it's fine to reference classes, since all classes already present a public "interface".
I was suggesting to rename the class so that the reference here doesn't declare that its target is an abstract class, which is irrelevant. Renaming it Segments frees us up to extract Segments as an interface in the future if we wish.
There was a problem hiding this comment.
It's an antipattern for a superclass to specify what subclasses it, specifically because it's likely that the next time someone adds a new subclass, it's unlikely they'll remember to update this comment, so then this comment will become misleading.
Plus, the HTML doc generator will automatically create a list of known subclasses, and any IDE also makes it trivially easy to discover what implements it.
Bonus: removing these references frees you from the obligation to add the dead-code classes that they point to.
IMHO, all the javadoc really needs to say is that this class manages {@link Segment}s.
There was a problem hiding this comment.
Ack. Will remove the two dead code classes and remove this comment.
e336c2d to
822209a
Compare
|
Updated this. |
guozhangwang
left a comment
There was a problem hiding this comment.
Just a minor question about the template on key schema, feel free to merge after replied.
There was a problem hiding this comment.
Is Session/WindowKeySchema always templated on KeyValueSegment (seems indicated in the javadoc below at line 157)?
There was a problem hiding this comment.
Yes. Atm, KeyValueSegment is the only class implementing Segment.
There was a problem hiding this comment.
As discussed in person: we can simplify this and remove the generic type from the interface and only use a generic type on method segmentsToSearch.
There was a problem hiding this comment.
The current hierarchy looks good to me.
822209a to
456e0a0
Compare
|
Rebased this to resolve merge conflicts. Merging this after Jenkins passed. |
|
Updated this to simplify generics. |
guozhangwang
left a comment
There was a problem hiding this comment.
LGTM! Please feel free to merge after tests passed.
|
Unrelated tests failed. Retest this please. |
* ak/trunk: MINOR: Update usage of deprecated API (apache#6146) KAFKA-4217: Add KStream.flatTransform (apache#5273) MINOR: Update Gradle to 5.1.1 (apache#6160) KAFKA-3522: Generalize Segments (apache#6170) Added quotes around the class path (apache#4469) KAFKA-7837: Ensure offline partitions are picked up as soon as possible when shrinking ISR (apache#6202) MINOR: In the MetadataResponse schema, ignorable should be a boolean KAFKA-7838: Log leader and follower end offsets when shrinking ISR (apache#6168) KAFKA-5692: Change PreferredReplicaLeaderElectionCommand to use Admin… (apache#3848) MINOR: clarify why suppress can sometimes drop tombstones (apache#6195) MINOR: Upgrade ducktape to 0.7.5 (apache#6197) MINOR: Improve IntegrationTestUtils documentation (apache#5664) MINOR: upgrade to jdk8 8u202 KAFKA-7693; Fix SequenceNumber overflow in producer (apache#5989) KAFKA-7692; Fix ProducerStateManager SequenceNumber overflow (apache#5990) MINOR: update copyright year in the NOTICE file. (apache#6196) KAFKA-7793: Improve the Trogdor command line. (apache#6133)
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Part of KIP-258: In order to support old and new segments (ie, without timestamps and with timestamps), the code is restructured:
Segmentinto interfaceSegmentandKeyValueSegmentclassSegmentsintoAbstractSegmentsandKeyValueSegmentsThis is no functional change. This is also completely internal and we can merge before the KIP is accepted.