Skip to content

KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure#10609

Merged
ableegoldman merged 1 commit intoapache:trunkfrom
ableegoldman:12648-add-NamedTopology
Jun 7, 2021
Merged

KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure#10609
ableegoldman merged 1 commit intoapache:trunkfrom
ableegoldman:12648-add-NamedTopology

Conversation

@ableegoldman
Copy link
Copy Markdown
Member

@ableegoldman ableegoldman commented Apr 29, 2021

Pt. 1: #10609
Pt. 2: #10683
Pt. 3: #10788

This PR includes adding the NamedTopology to the Subscription/AssignmentInfo, and to the StateDirectory so it can place NamedTopology tasks within the hierarchical structure with task directories under the NamedTopology parent dir.

@ableegoldman
Copy link
Copy Markdown
Member Author

cc @wcarlson5

@ableegoldman ableegoldman force-pushed the 12648-add-NamedTopology branch 2 times, most recently from 8378ef3 to 744df1f Compare April 30, 2021 00:45
@ableegoldman ableegoldman force-pushed the 12648-add-NamedTopology branch 3 times, most recently from be84837 to 0b3f176 Compare May 11, 2021 02:51
@ableegoldman ableegoldman force-pushed the 12648-add-NamedTopology branch 4 times, most recently from d54b17b to f11911b Compare May 12, 2021 21:21
@ableegoldman ableegoldman changed the title KAFKA-12648: add NamedTopology [WIP] KAFKA-12648: add NamedTopology to protocol and state directory structure May 12, 2021
@ableegoldman ableegoldman requested a review from guozhangwang May 12, 2021 21:32
@ableegoldman ableegoldman force-pushed the 12648-add-NamedTopology branch from ead4def to dcd259e Compare May 12, 2021 21:46
@ableegoldman ableegoldman changed the title KAFKA-12648: add NamedTopology to protocol and state directory structure KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure May 13, 2021
Copy link
Copy Markdown
Contributor

@wcarlson5 wcarlson5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. I don't see anything that worries me and won't be cleaned/finished in the follow ups, so I think we are good to merge!

Comment thread streams/src/main/java/org/apache/kafka/streams/processor/TaskId.java Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming that this will be done in a later PR

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's covered by Pt. 2 -- the PR is available for review, but it's on top of this PR so it's probably going to be difficult to review until this one is merged

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I saw a comment about updating these with named topologies later. Is there are reason you are waiting?

Copy link
Copy Markdown
Member Author

@ableegoldman ableegoldman May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment was more about usages in integration/other unit tests, where we may want to tie in some NamedTopologyies once the basic feature is fully implemented. For the StateDirectoryTest everything should already be implemented so we should have the NamedTopology logic in that class covered by tests in this PR (actually I need to add a few more, I think)

@ableegoldman ableegoldman force-pushed the 12648-add-NamedTopology branch from dcd259e to abd71ca Compare May 13, 2021 18:35
Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a quick pass over the non-testing code, left some clarification questions.

private List<TaskDirectory> listTaskDirectories(final FileFilter filter) {
final List<TaskDirectory> taskDirectories = new ArrayList<>();
if (hasPersistentStores && stateDir.exists()) {
if (hasNamedTopologies) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we can have named topology state dirs and unamed (original) state dirs co-exist here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that should not be allowed. We have checks to verify this in a few places where it matters, but it's an assumption we can make here. I'm not sure if your question was from an implementation point of view or a semantic one, but I can further clarify or justify why it should not be allowed if you want

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was asking more for a semantic one -- as long as this is not expected then I'm happy for this piece as is :)

}
}
}
maybeCleanEmptyNamedTopologyDirs();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we move this into the try/catch IOException block as well (ditto below)?

final SubscriptionInfoData.TaskOffsetSum taskOffsetSum = new SubscriptionInfoData.TaskOffsetSum();
final TaskId task = t.getKey();
taskOffsetSum.setTopicGroupId(task.topicGroupId);
taskOffsetSum.setPartition(task.partition);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remind me why we want to include the partition id in the new version as well?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. I tried to explain this with a comment on the SubscriptionInfoData.json schema but I'll call it out again in the SubscriptionInfo.java class. Previously we encoded the offset sums as a nested "map" of <topicGroupId, <partition, offsetSum>>, where the "map" is really an array and the array struct does not itself allow for struct types. It's just a gap in the API that no one has cared or had time to close, not a fundamental principle. Anyways this meant we had a TopicGroupId and a PartitionToOffsetSum struct, where in turn the PartitionToOffsetSum was composed of the partition and offset sum base types.

I guess this was reasonable enough when there were only 3 base fields, but if we wanted to maintain this nested array structure it would mean adding more and more nested structs each time we added a field. I felt this would get to be too complicated and annoying to deal with so I flattened the OffsetSum struct out to just include each base field directly

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

package org.apache.kafka.streams.integration;

public class NamedTopologyIntegrationTest {
//TODO KAFKA-12648
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to lay out my test plan somewhere, so it doesn't seem like I'm merging all this code with no intention to ever test it. Once the final pieces are in (should be by Pt. 2) these are the things I think are important to touch on with integration tests. Leave a comment if you have any more suggestions or feedback 🙂

import static org.junit.Assert.assertTrue;
import static org.junit.Assert.fail;

//TODO KAFKA-12648: add tests for named topology specific stuff
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a few tests for the named topology stuff but I definitely want to add more and just haven't yet had time. Since I'm still on-call and therefore unlikely to have time until next week, if you both are able to do a quick pass and don't have any further feedback on the PR as-is, it may make sense to just merge this PR tomorrow (Friday) and do a quick followup PR for the tests next week.

That way I can rebase the next PR (Pt. 2) and you all can actually begin reviewing that. cc @guozhangwang @wcarlson5

@ableegoldman
Copy link
Copy Markdown
Member Author

ableegoldman commented May 21, 2021

Rebased after the TaskId changes in KIP-470, and responded to all comments. Not much has changed since the last review, just cleaning up here and there. It's pretty much done except for StateDirectoryTest, which I can always do in a quick followup PR to unblock other downstream work with this edit: tests are done, this PR is fully ready for review and merge

@ableegoldman ableegoldman force-pushed the 12648-add-NamedTopology branch 3 times, most recently from 112b768 to c254a79 Compare May 25, 2021 01:36
}

private void cleanRemovedTasksCalledByCleanerThread(final long cleanupDelayMs) {
for (final File taskDir : listNonEmptyTaskDirectories()) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to call this out since it's a change in behavior unrelated to this PR -- actually just something we could/should have cleaned up after removing the lock/file based locking. Previously we couldn't ever delete empty task dirs by cleaner thread (due to that Windows bug), now we can, so we should not exclude empty dirs here

Copy link
Copy Markdown
Contributor

@wcarlson5 wcarlson5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any problems with the recent changes so I am still +1

@ableegoldman ableegoldman force-pushed the 12648-add-NamedTopology branch from c31dacc to 4e49061 Compare May 27, 2021 03:47
@ableegoldman ableegoldman force-pushed the 12648-add-NamedTopology branch 2 times, most recently from 87e428b to 9c0542c Compare May 27, 2021 03:58
}

@Test
public void shouldCleanupObsoleteTaskDirectoriesInNamedTopologiesAndDeleteTheParentDirectories() throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a test case to verify that in case both named topology dir and non-named topology dir co-exist, we would at certain step check against and throw?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, will add this test in the next PR

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just another minor comment about covering the not-allowed co-existence of both named and non-named topologies, otherwise LGTM.

@ableegoldman
Copy link
Copy Markdown
Member Author

Thanks @guozhangwang ! I'm going to go ahead and merge this one now since it's been out there for so long, I will follow up on that test case in the Pt. 2 PR

@ableegoldman ableegoldman merged commit 48379bd into apache:trunk Jun 7, 2021
@cadonna cadonna mentioned this pull request Jul 1, 2021
3 tasks
ableegoldman added a commit that referenced this pull request Jul 28, 2021
…ogyBuilders of named topologies (#10683)

Pt. 1: #10609
Pt. 2: #10683
Pt. 3: #10788

The TopologyMetadata is next up after Pt. 1 #10609. This PR sets up the basic architecture for running an app with multiple NamedTopologies, though the APIs to add/remove them dynamically are not implemented until Pt. 3

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
ableegoldman added a commit that referenced this pull request Aug 6, 2021
Pt. 1: #10609
Pt. 2: #10683
Pt. 3: #10788

In Pt. 3 we implement the addNamedTopology API. This can be used to update the processing topology of a running Kafka Streams application without resetting the app, or even pausing/restarting the process. It's up to the user to ensure that this API is called on every instance of an application to ensure all clients are able to run the newly added NamedTopology. 

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
xdgrulez pushed a commit to xdgrulez/kafka that referenced this pull request Dec 22, 2021
…ogyBuilders of named topologies (apache#10683)

Pt. 1: apache#10609
Pt. 2: apache#10683
Pt. 3: apache#10788

The TopologyMetadata is next up after Pt. 1 apache#10609. This PR sets up the basic architecture for running an app with multiple NamedTopologies, though the APIs to add/remove them dynamically are not implemented until Pt. 3

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
xdgrulez pushed a commit to xdgrulez/kafka that referenced this pull request Dec 22, 2021
Pt. 1: apache#10609
Pt. 2: apache#10683
Pt. 3: apache#10788

In Pt. 3 we implement the addNamedTopology API. This can be used to update the processing topology of a running Kafka Streams application without resetting the app, or even pausing/restarting the process. It's up to the user to ensure that this API is called on every instance of an application to ensure all clients are able to run the newly added NamedTopology. 

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants