Skip to content

KAFKA-3522: Generalize Segments#6170

Merged
mjsax merged 2 commits intoapache:trunkfrom
mjsax:kafka-3522-rocksdb-format-rework-segments
Jan 26, 2019
Merged

KAFKA-3522: Generalize Segments#6170
mjsax merged 2 commits intoapache:trunkfrom
mjsax:kafka-3522-rocksdb-format-rework-segments

Conversation

@mjsax
Copy link
Copy Markdown
Member

@mjsax mjsax commented Jan 18, 2019

Part of KIP-258: In order to support old and new segments (ie, without timestamps and with timestamps), the code is restructured:

  • splitting Segment into interface Segment and KeyValueSegment class
  • splitting Segments into AbstractSegments and KeyValueSegments

This is no functional change. This is also completely internal and we can merge before the KIP is accepted.

@mjsax mjsax added the streams label Jan 18, 2019
@mjsax mjsax force-pushed the kafka-3522-rocksdb-format-rework-segments branch from 85ed1f3 to 5c94c1e Compare January 18, 2019 21:34
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this as dummy because I reference it in JavaDocs of AbstractSegments -- will be implemented in follow up PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but I'm not a huge fan of dead code (here or below). IMHO, it would have been better to avoid adding the reference in the javadoc and thereby avoid the obligation to add both of these un-implemented, un-tested, and un-used classes.

No change requested, just sharing my thoughts ;) I know you're queuing up the implementations in another PR right away.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree, I did it this way, because otherwise, updating the Javadoc would have most likely slipped...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce generic Segment type so we can use it for both segment types

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be implement in derived classed now

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is copied from old class Segment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces old class Segments

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now an interface -- actual code moved to KeyValueSegment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this as dummy because I reference it in JavaDocs of AbstractSegments -- will be implemented in follow up PR.

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Jan 18, 2019

Call for review @guozhangwang @bbejeck @vvcephei @ableegoldman

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @mjsax, this LGTM just one minor comment.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm wondering if this would be easier to read

if(segments.put(segmentId, newSegment)!= null){
  throw new IllegalStateException("KeyValueSegment already exists. Possible concurrent access.");
}

However, this is highly opinated

@mjsax mjsax force-pushed the kafka-3522-rocksdb-format-rework-segments branch from 5c94c1e to e336c2d Compare January 21, 2019 04:00
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Jan 21, 2019

Updated per comment

Copy link
Copy Markdown
Contributor

@vvcephei vvcephei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, @mjsax . I left a few comments. I'm really just concerned about the access to the previously-private fields and the name of AbstractSegments.

Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to the flat package structure, this change actually exposes these fields to 66 other classes, all of our other internal state-related code. This is actually pretty risky, since segments in particular is mutable.

Your intent is to broaden the scope to only the subclasses, but Java has no visibility modifier to do that. (protected seems to be the one, but its visibility is actually broader than package-protected)

To enforce the desired visibility restriction, we would have to create a new package, org.apache.kafka.streams.state.internals.segments, and locate the abstract class and both implementations in it. Then the fields would be visible only to the implementing classes, as desired.

Personally, I think the mutable segments field justifies this, as future internal state-store code changes might accidentally think it's ok to touch segments, directly, leading to subtle bugs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this danger. Note, that all changes must be approved by a committer and thus, I am sure this would be caught (it's not a project for which somebody will limited knowledge can easily push changes). If you insist on a different package structure I am also fine with it, even if I don't see a bid advantage introducing many small packages.

\cc @guozhangwang

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, there may not be a big advantage to many small packages, but in this particular case, it happens to be the only way to safely protect what is intended to be private state.

Personally, I wouldn't lean too heavily on the eternal vigilence of reviewers to catch subtle bugs when there's an easy way to prevent them structurally. I certainly wouldn't trust myself to just remember that this field, although it's made accessible to the whole package, is actually supposed to be used only by subclasses. In two years' time, I think I would just assume it's supposed to be accessible, since that's what the code says.

Also curious what @guozhangwang and @bbejeck think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see you already added the interface. This is fine, too!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current hierarchy looks good to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but I'm not a huge fan of dead code (here or below). IMHO, it would have been better to avoid adding the reference in the javadoc and thereby avoid the obligation to add both of these un-implemented, un-tested, and un-used classes.

No change requested, just sharing my thoughts ;) I know you're queuing up the implementations in another PR right away.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unusual to hold a reference to an abstract class like this. I believe the intent is to be able to transparently handle either KeyValueSegments or (I'm guessing) KeyValueTimestampSegments.

The full expression would be to have a Segments interface implemented by AbstractSegments, which is then extended by your two implementations. Then this line would reference Segments<S>. It's fine to collapse this into just the abstract class (although questionable in the presence of package-protected fields). But to maintain transparency, I'd name the abstract class Segments instead of AbstractSegments. That way, to an outsider class (like this one), you're still just interacting with the interface (i.e., the public interface of the class), rather than specifically an abstract class.

Adhering to this pattern leaves the door open in the future to extract Segments into a full interface without having to change any outside code (which is what I meant by maintain transparency).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unusual to hold a reference to an abstract class like this. I believe the intent is to be able to transparently handle either KeyValueSegments or (I'm guessing) KeyValueTimestampSegments.

Correct.

I don't see a big improvement adding an interface, but if it makes you happy :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and it might by splitting hairs, but I actually was only suggesting to rename AbstractSegments to Segments. Although some projects require all references to be interfaces, it generally leads to low-value bloat. I think it's fine to reference classes, since all classes already present a public "interface".

I was suggesting to rename the class so that the reference here doesn't declare that its target is an abstract class, which is irrelevant. Renaming it Segments frees us up to extract Segments as an interface in the future if we wish.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an antipattern for a superclass to specify what subclasses it, specifically because it's likely that the next time someone adds a new subclass, it's unlikely they'll remember to update this comment, so then this comment will become misleading.

Plus, the HTML doc generator will automatically create a list of known subclasses, and any IDE also makes it trivially easy to discover what implements it.

Bonus: removing these references frees you from the obligation to add the dead-code classes that they point to.

IMHO, all the javadoc really needs to say is that this class manages {@link Segment}s.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Will remove the two dead code classes and remove this comment.

@mjsax mjsax force-pushed the kafka-3522-rocksdb-format-rework-segments branch from e336c2d to 822209a Compare January 22, 2019 00:55
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Jan 22, 2019

Updated this.

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor question about the template on key schema, feel free to merge after replied.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Session/WindowKeySchema always templated on KeyValueSegment (seems indicated in the javadoc below at line 157)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Atm, KeyValueSegment is the only class implementing Segment.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in person: we can simplify this and remove the generic type from the interface and only use a generic type on method segmentsToSearch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current hierarchy looks good to me.

@mjsax mjsax force-pushed the kafka-3522-rocksdb-format-rework-segments branch from 822209a to 456e0a0 Compare January 25, 2019 07:39
@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Jan 25, 2019

Rebased this to resolve merge conflicts. Merging this after Jenkins passed.

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Jan 25, 2019

Updated this to simplify generics.

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Please feel free to merge after tests passed.

@mjsax
Copy link
Copy Markdown
Member Author

mjsax commented Jan 25, 2019

Unrelated tests failed. Retest this please.

@mjsax mjsax merged commit ef62dd3 into apache:trunk Jan 26, 2019
@mjsax mjsax deleted the kafka-3522-rocksdb-format-rework-segments branch January 26, 2019 07:56
jarekr pushed a commit to confluentinc/kafka that referenced this pull request Apr 18, 2019
* ak/trunk:
  MINOR: Update usage of deprecated API (apache#6146)
  KAFKA-4217: Add KStream.flatTransform (apache#5273)
  MINOR: Update Gradle to 5.1.1 (apache#6160)
  KAFKA-3522: Generalize Segments (apache#6170)
  Added quotes around the class path (apache#4469)
  KAFKA-7837: Ensure offline partitions are picked up as soon as possible when shrinking ISR (apache#6202)
  MINOR: In the MetadataResponse schema, ignorable should be a boolean
  KAFKA-7838: Log leader and follower end offsets when shrinking ISR (apache#6168)
  KAFKA-5692: Change PreferredReplicaLeaderElectionCommand to use Admin… (apache#3848)
  MINOR: clarify why suppress can sometimes drop tombstones (apache#6195)
  MINOR: Upgrade ducktape to 0.7.5 (apache#6197)
  MINOR: Improve IntegrationTestUtils documentation (apache#5664)
  MINOR: upgrade to jdk8 8u202
  KAFKA-7693; Fix SequenceNumber overflow in producer (apache#5989)
  KAFKA-7692; Fix ProducerStateManager SequenceNumber overflow (apache#5990)
  MINOR: update copyright year in the NOTICE file. (apache#6196)
  KAFKA-7793: Improve the Trogdor command line. (apache#6133)
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>
@mjsax mjsax added the kip Requires or implements a KIP label Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kip Requires or implements a KIP streams

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants