MINOR: ControllerServer should use the new metadata loader and snapshot generator#12983
MINOR: ControllerServer should use the new metadata loader and snapshot generator#12983cmccabe merged 10 commits intoapache:trunkfrom
Conversation
a6adc9d to
5bb398d
Compare
…ot generator. This PR introduces the new metadata loader and snapshot generator. For the time being, they are only used by the controller, but a PR for the broker will come soon. The new metadata loader supports adding and removing publishers dynamically. (In contrast, the old loader only supported adding a single publisher.) It also passes along more information about each new image that is published. This information can be found in the LogDeltaManifest and SnapshotManifest classes. The new snapshot generator replaces the previous logic for generating snapshots in QuorumController.java and associated classes. The new generator is intended to be shared between the broker and the controller, so it is decoupled from both. There are a few small changes to the old snapshot generator in this PR. Specifically, we move the batch processing time and batch size metrics out of BrokerMetadataListener.scala and into BrokerServerMetrics.scala.
5bb398d to
7faba25
Compare
mumrah
left a comment
There was a problem hiding this comment.
@cmccabe, thanks for the patch! It's really nice to see the snapshot iterators in the ControlManagers go. Keeping the logic consistent between those and MetadataDelta was not easy.
Does the new snapshot generator allow for multiple snapshots to be created concurrently? Or at least allow us to request more than one at a time?
There was a problem hiding this comment.
Is "Publisher" the right name here? It makes it sound a bit like instances of this interface will be publishing metadata, when they are actually being published to. What about MetadataListener or MetadataConsumer or something?
There was a problem hiding this comment.
The idea behind calling them "publishers" it that they publish metadata to the rest of the system. "Listener" is already a very overloaded name -- for example, the MetadataLoader is a RaftClient.Listener. I think it would be confusing to have another listener.
The thing about the publishers is that metadata is NOT published immediately after we get it. We kind of have this kind of nomenclature:
listener: gets the raw update stream from the Raft layer. This includes a lot of junk that we don't care about and it has some messy differences between snapshots and log batches, so most code doesn't want to deal with it. UltimatelyQuorumControllerandMetadataLoaderwill probably be the two pieces of code that do.loader: translates the raw update stream into delta and image objects. Handles metadata transactions and snapshot application.publisher: publishes images to the rest of the system and to external places (like the filesystem).
Loader errors are very bad, clearly, because it means you've failed to correctly translate the updates. Publisher errors might not be that bad. It could just mean that you couldn't reinitialize a thread pool when a config changed, for example. So we have separate metrics for these two things on the broker. To be fair, the publisher error metric refers to metadata-apply-error-count rather than metadata-publish-error-count. But Applier isn't a great class name, and Applier#apply doesn't seem better than Publisher#publish... well, to me, at least.
The idea is that most people are operating at the 3rd layer (publisher) and don't need to deal with messy translation and loading issues. So they can register their own publisher and just handle the incoming update object stream. So a big part of this is breaking up the single big publisher into a bunch of smaller ones to facilitate this modularity. It's not as trivial as just making N callbacks instead of 1 because with multiple publishers you have to deal with stuff like "this publisher never saw any metadata before, now it needs to get a full update." etc.
There was a problem hiding this comment.
Thanks for the rationale. It makes sense 👍
There was a problem hiding this comment.
Are the changes to BrokerMetadataListener and BrokerServerMetrics needed in this PR, or can we save them for the next one (with the BrokerServer changes)?
There was a problem hiding this comment.
I think this should be included here because MetadataLoader needs to publish its metrics. It's a pretty small change in general -- just means BrokerMetadataListener needs to call into BrokerServerMetrics rather than directly bumping a metric.
|
@cmccabe with such a big change to the controller, i think we'll be relying heavily on tests for this PR. Can we do a system test run for this PR? |
mumrah
left a comment
There was a problem hiding this comment.
I like the overall strategy here with the separation of concerns. It cleans up a lot of technical debt we've had in KRaft for a while now.
LGTM pending good tests and a reasonable system test run :)
…ot generator (#12983) This PR introduces the new metadata loader and snapshot generator. For the time being, they are only used by the controller, but a PR for the broker will come soon. The new metadata loader supports adding and removing publishers dynamically. (In contrast, the old loader only supported adding a single publisher.) It also passes along more information about each new image that is published. This information can be found in the LogDeltaManifest and SnapshotManifest classes. The new snapshot generator replaces the previous logic for generating snapshots in QuorumController.java and associated classes. The new generator is intended to be shared between the broker and the controller, so it is decoupled from both. There are a few small changes to the old snapshot generator in this PR. Specifically, we move the batch processing time and batch size metrics out of BrokerMetadataListener.scala and into BrokerServerMetrics.scala. Finally, fix a case where we are using 'is' rather than '==' for a numeric comparison in snapshot_test.py. Reviewers: David Arthur <mumrah@gmail.com>
…ot generator (apache#12983) This PR introduces the new metadata loader and snapshot generator. For the time being, they are only used by the controller, but a PR for the broker will come soon. The new metadata loader supports adding and removing publishers dynamically. (In contrast, the old loader only supported adding a single publisher.) It also passes along more information about each new image that is published. This information can be found in the LogDeltaManifest and SnapshotManifest classes. The new snapshot generator replaces the previous logic for generating snapshots in QuorumController.java and associated classes. The new generator is intended to be shared between the broker and the controller, so it is decoupled from both. There are a few small changes to the old snapshot generator in this PR. Specifically, we move the batch processing time and batch size metrics out of BrokerMetadataListener.scala and into BrokerServerMetrics.scala. Finally, fix a case where we are using 'is' rather than '==' for a numeric comparison in snapshot_test.py. Reviewers: David Arthur <mumrah@gmail.com>
This PR introduces the new metadata loader and snapshot generator. For the time being, they are only used by the controller, but a PR for the broker will come soon.
The new metadata loader supports adding and removing publishers dynamically. (In contrast, the old loader only supported adding a single publisher.) It also passes along more information about each new image that is published. This information can be found in the LogDeltaManifest and SnapshotManifest classes.
The new snapshot generator replaces the previous logic for generating snapshots in QuorumController.java and associated classes. The new generator is intended to be shared between the broker and the controller, so it is decoupled from both.
There are a few small changes to the old snapshot generator in this PR. Specifically, we move the batch processing time and batch size metrics out of BrokerMetadataListener.scala and into BrokerServerMetrics.scala.