KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure by ableegoldman · Pull Request #10609 · apache/kafka

ableegoldman · 2021-04-29T07:39:46Z

Pt. 1: #10609
Pt. 2: #10683
Pt. 3: #10788

This PR includes adding the NamedTopology to the Subscription/AssignmentInfo, and to the StateDirectory so it can place NamedTopology tasks within the hierarchical structure with task directories under the NamedTopology parent dir.

ableegoldman · 2021-04-29T07:40:28Z

cc @wcarlson5

wcarlson5

This makes sense. I don't see anything that worries me and won't be cleaned/finished in the follow ups, so I think we are good to merge!

wcarlson5 · 2021-05-13T17:07:24Z

I am assuming that this will be done in a later PR

Yep, that's covered by Pt. 2 -- the PR is available for review, but it's on top of this PR so it's probably going to be difficult to review until this one is merged

wcarlson5 · 2021-05-13T17:20:57Z

I think I saw a comment about updating these with named topologies later. Is there are reason you are waiting?

That comment was more about usages in integration/other unit tests, where we may want to tie in some NamedTopologyies once the basic feature is fully implemented. For the StateDirectoryTest everything should already be implemented so we should have the NamedTopology logic in that class covered by tests in this PR (actually I need to add a few more, I think)

guozhangwang

Made a quick pass over the non-testing code, left some clarification questions.

guozhangwang · 2021-05-18T17:49:22Z

+    private List<TaskDirectory> listTaskDirectories(final FileFilter filter) {
+        final List<TaskDirectory> taskDirectories = new ArrayList<>();
+        if (hasPersistentStores && stateDir.exists()) {
+            if (hasNamedTopologies) {


Is it possible that we can have named topology state dirs and unamed (original) state dirs co-exist here?

No, that should not be allowed. We have checks to verify this in a few places where it matters, but it's an assumption we can make here. I'm not sure if your question was from an implementation point of view or a semantic one, but I can further clarify or justify why it should not be allowed if you want

I was asking more for a semantic one -- as long as this is not expected then I'm happy for this piece as is :)

guozhangwang · 2021-05-18T17:59:38Z

                }
            }
        }
+        maybeCleanEmptyNamedTopologyDirs();


Should we move this into the try/catch IOException block as well (ditto below)?

guozhangwang · 2021-05-18T18:03:25Z

+            final SubscriptionInfoData.TaskOffsetSum taskOffsetSum = new SubscriptionInfoData.TaskOffsetSum();
+            final TaskId task = t.getKey();
+            taskOffsetSum.setTopicGroupId(task.topicGroupId);
+            taskOffsetSum.setPartition(task.partition);


Could you remind me why we want to include the partition id in the new version as well?

Ah, yes. I tried to explain this with a comment on the SubscriptionInfoData.json schema but I'll call it out again in the SubscriptionInfo.java class. Previously we encoded the offset sums as a nested "map" of <topicGroupId, <partition, offsetSum>>, where the "map" is really an array and the array struct does not itself allow for struct types. It's just a gap in the API that no one has cared or had time to close, not a fundamental principle. Anyways this meant we had a TopicGroupId and a PartitionToOffsetSum struct, where in turn the PartitionToOffsetSum was composed of the partition and offset sum base types.

I guess this was reasonable enough when there were only 3 base fields, but if we wanted to maintain this nested array structure it would mean adding more and more nested structs each time we added a field. I felt this would get to be too complicated and annoying to deal with so I flattened the OffsetSum struct out to just include each base field directly

Thanks for the explanation!

ableegoldman · 2021-05-21T04:02:55Z

+package org.apache.kafka.streams.integration;
+
+public class NamedTopologyIntegrationTest {
+    //TODO KAFKA-12648


Just wanted to lay out my test plan somewhere, so it doesn't seem like I'm merging all this code with no intention to ever test it. Once the final pieces are in (should be by Pt. 2) these are the things I think are important to touch on with integration tests. Leave a comment if you have any more suggestions or feedback 🙂

ableegoldman · 2021-05-21T04:08:18Z

 import static org.junit.Assert.assertTrue;
 import static org.junit.Assert.fail;

+//TODO KAFKA-12648: add tests for named topology specific stuff


I've added a few tests for the named topology stuff but I definitely want to add more and just haven't yet had time. Since I'm still on-call and therefore unlikely to have time until next week, if you both are able to do a quick pass and don't have any further feedback on the PR as-is, it may make sense to just merge this PR tomorrow (Friday) and do a quick followup PR for the tests next week.

That way I can rebase the next PR (Pt. 2) and you all can actually begin reviewing that. cc @guozhangwang @wcarlson5

ableegoldman · 2021-05-21T04:10:01Z

Rebased after the TaskId changes in KIP-470, and responded to all comments. Not much has changed since the last review, just cleaning up here and there. ~~It's pretty much done except for StateDirectoryTest, which I can always do in a quick followup PR to unblock other downstream work with this~~ edit: tests are done, this PR is fully ready for review and merge

ableegoldman · 2021-05-25T01:47:31Z

    }

    private void cleanRemovedTasksCalledByCleanerThread(final long cleanupDelayMs) {
-        for (final File taskDir : listNonEmptyTaskDirectories()) {


Just want to call this out since it's a change in behavior unrelated to this PR -- actually just something we could/should have cleaned up after removing the lock/file based locking. Previously we couldn't ever delete empty task dirs by cleaner thread (due to that Windows bug), now we can, so we should not exclude empty dirs here

wcarlson5

I don't have any problems with the recent changes so I am still +1

guozhangwang · 2021-06-02T21:54:24Z

+    }
+
+    @Test
+    public void shouldCleanupObsoleteTaskDirectoriesInNamedTopologiesAndDeleteTheParentDirectories() throws IOException {


Could we add a test case to verify that in case both named topology dir and non-named topology dir co-exist, we would at certain step check against and throw?

Ack, will add this test in the next PR

guozhangwang

Just another minor comment about covering the not-allowed co-existence of both named and non-named topologies, otherwise LGTM.

ableegoldman · 2021-06-07T22:37:26Z

Thanks @guozhangwang ! I'm going to go ahead and merge this one now since it's been out there for so long, I will follow up on that test case in the Pt. 2 PR

…ogyBuilders of named topologies (#10683) Pt. 1: #10609 Pt. 2: #10683 Pt. 3: #10788 The TopologyMetadata is next up after Pt. 1 #10609. This PR sets up the basic architecture for running an app with multiple NamedTopologies, though the APIs to add/remove them dynamically are not implemented until Pt. 3 Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>

Pt. 1: #10609 Pt. 2: #10683 Pt. 3: #10788 In Pt. 3 we implement the addNamedTopology API. This can be used to update the processing topology of a running Kafka Streams application without resetting the app, or even pausing/restarting the process. It's up to the user to ensure that this API is called on every instance of an application to ensure all clients are able to run the newly added NamedTopology. Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>

…ogyBuilders of named topologies (apache#10683) Pt. 1: apache#10609 Pt. 2: apache#10683 Pt. 3: apache#10788 The TopologyMetadata is next up after Pt. 1 apache#10609. This PR sets up the basic architecture for running an app with multiple NamedTopologies, though the APIs to add/remove them dynamically are not implemented until Pt. 3 Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>

Pt. 1: apache#10609 Pt. 2: apache#10683 Pt. 3: apache#10788 In Pt. 3 we implement the addNamedTopology API. This can be used to update the processing topology of a running Kafka Streams application without resetting the app, or even pausing/restarting the process. It's up to the user to ensure that this API is called on every instance of an application to ensure all clients are able to run the newly added NamedTopology. Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>

ableegoldman force-pushed the 12648-add-NamedTopology branch 2 times, most recently from 8378ef3 to 744df1f Compare April 30, 2021 00:45

ableegoldman mentioned this pull request Apr 30, 2021

KAFKA-12648: basic skeleton API for NamedTopology #10615

Merged

ableegoldman force-pushed the 12648-add-NamedTopology branch 3 times, most recently from be84837 to 0b3f176 Compare May 11, 2021 02:51

ableegoldman added the streams label May 12, 2021

ableegoldman force-pushed the 12648-add-NamedTopology branch 4 times, most recently from d54b17b to f11911b Compare May 12, 2021 21:21

ableegoldman changed the title ~~KAFKA-12648: add NamedTopology [WIP]~~ KAFKA-12648: add NamedTopology to protocol and state directory structure May 12, 2021

ableegoldman requested a review from guozhangwang May 12, 2021 21:32

ableegoldman force-pushed the 12648-add-NamedTopology branch from ead4def to dcd259e Compare May 12, 2021 21:46

ableegoldman mentioned this pull request May 12, 2021

KAFKA-12648: Pt. 2 - Introduce TopologyMetadata to wrap InternalTopologyBuilders of named topologies #10683

Merged

ableegoldman changed the title ~~KAFKA-12648: add NamedTopology to protocol and state directory structure~~ KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure May 13, 2021

wcarlson5 approved these changes May 13, 2021

View reviewed changes

ableegoldman force-pushed the 12648-add-NamedTopology branch from dcd259e to abd71ca Compare May 13, 2021 18:35

guozhangwang reviewed May 18, 2021

View reviewed changes

ableegoldman commented May 21, 2021

View reviewed changes

ableegoldman force-pushed the 12648-add-NamedTopology branch 3 times, most recently from 112b768 to c254a79 Compare May 25, 2021 01:36

ableegoldman commented May 25, 2021

View reviewed changes

ableegoldman mentioned this pull request May 25, 2021

[For Review] Pt. 2 - Introduce TopologyMetadata to wrap InternalTopologyBuilders of named topologies ableegoldman/kafka#5

Closed

wcarlson5 approved these changes May 25, 2021

View reviewed changes

ableegoldman force-pushed the 12648-add-NamedTopology branch from c31dacc to 4e49061 Compare May 27, 2021 03:47

ableegoldman force-pushed the 12648-add-NamedTopology branch 2 times, most recently from 87e428b to 9c0542c Compare May 27, 2021 03:58

Update StateDirectory and Assignment/SubscriptionInfo

7175e27

ableegoldman force-pushed the 12648-add-NamedTopology branch from 9c0542c to 7175e27 Compare May 27, 2021 03:59

This was referenced May 27, 2021

[For Review] Pt. 2 - Introduce TopologyMetadata to wrap InternalTopologyBuilders of named topologies ableegoldman/kafka#6

Closed

KAFKA-12648: Pt. 3 - addNamedTopology API #10788

Merged

guozhangwang reviewed Jun 2, 2021

View reviewed changes

guozhangwang approved these changes Jun 2, 2021

View reviewed changes

ableegoldman merged commit 48379bd into apache:trunk Jun 7, 2021

lkokhreidze mentioned this pull request Jun 8, 2021

KAFKA-6718 / Update SubscriptionInfoData with clientTags #10802

Merged

3 tasks

cadonna mentioned this pull request Jul 1, 2021

Fix verification of version probing #10943

Merged

3 tasks

ableegoldman mentioned this pull request Oct 21, 2021

KAFKA-12648: Pt. 4 - return Add/RemoveNamedTopologyResult so callers can wait on topology changes #11421

Closed

Conversation

ableegoldman commented Apr 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ableegoldman commented Apr 29, 2021

Uh oh!

wcarlson5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ableegoldman May 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ableegoldman commented May 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wcarlson5 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

guozhangwang left a comment

Choose a reason for hiding this comment

Uh oh!

ableegoldman commented Jun 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ableegoldman commented Apr 29, 2021 •

edited

Loading

ableegoldman May 13, 2021 •

edited

Loading

ableegoldman commented May 21, 2021 •

edited

Loading