Skip to content

Conversation

@TakaHiR07
Copy link
Contributor

@TakaHiR07 TakaHiR07 commented Apr 19, 2022

Motivation

I encounter the same problem as #11682 described, and i notice two pr for it, #10374 and #11683

But I think they still remain problems.

Actually internalUpdatePartitionedTopic consist of three part:

  1. tryCreatePartitionsAsync(): create the topic newPartition on metadataStore such as zk
  2. createSubscriptions(): creates subscriptions for new partitions of existing partitioned-topics
  3. updatePartitionedTopicAsync(): update the partitioned-topic metadata(update partition num to newPartition)

Suppose there are subscriptions in a partitioned-topic and we want to update its partition.

If part 1, 2 succeed, part 3 failed, it need to clean up managed-ledger znode. But it would throw zookeeper Directory not empty exception when clean-up, since znode would has successfully created subscription children node.

wecom-temp-977f0ae410a236f38d7822fc12d653b6

And when we retrying updatePartition again, it would throw the below error because part 2 complete
wecom-temp-96b53fea557f4984494a8d213872f1d3

So we need to retry updatePartition again with "force=true". The "Subscription already exists" error would be catch, and it complete the updatePartition operation, while permanently skip the part 3.

Modifications

After catch the "Subscription already exists" error, do updatePartitionedTopicAsync() operation

discussions

When adding 'force=true' , the managed-ledger znode clean up operation seems not neccessary and can be removed?

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

  • Dependencies (does it add or upgrade a dependency): no
  • The public API: no
  • The schema: no
  • The default values of configurations: no
  • The wire protocol: no
  • The rest endpoints: no
  • The admin cli options: no
  • Anything that affects deployment: no

Documentation

Check the box below or label this PR directly.

Need to update docs?

  • no-need-doc
    (Please explain why)

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-not-needed Your PR changes do not impact docs labels Apr 19, 2022
@github-actions
Copy link

@takahiro0208:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

@github-actions
Copy link

@takahiro0208:Thanks for providing doc info!

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 19, 2022
@gaozhangmin
Copy link
Contributor

gaozhangmin commented Apr 19, 2022

@takahiro0208 I think the proper way is, createSubscriptions should be called after updatePartitionedTopicAsync successfully.

@gaozhangmin gaozhangmin added this to the 2.11.0 milestone Apr 19, 2022
@gaozhangmin gaozhangmin changed the title [pulsar-broker] Fix update topic partitions failed [fix][broker] Fix update topic partitions failed Apr 19, 2022
@gaozhangmin gaozhangmin requested a review from eolivelli April 19, 2022 11:23
@TakaHiR07
Copy link
Contributor Author

@takahiro0208 I think the proper way is, createSubscriptions should be called after updatePartitionedTopicAsync successfully.

@gaozhangmin Thank you for your review. Yep, this can also solve the problem and remove redundant code, I can revise the committed code following this. But, there would be a situation: after successful updatePartitionedTopicAsync and failed createSubscriptions, we would still get newPartitions by getPartitionedTopicMetadata. Does it matter??

});
result.completeExceptionally(ex2);
return null;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updatePartitionedTopicAsync has handled the exception, so it can't delegate the exception here.
BTW, could you help refactor this method ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it do not handle since the code between 4033-4035 also delegate the exception. And I think I can help refactor this method.

Copy link
Contributor

@gaozhangmin gaozhangmin Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takahiro0208 If createSubscriptions failed, there would be a situation which sub Znode under managed-ledgers had created. your deletion here will fail also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@takahiro0208 If createSubscriptions failed, there would be a situation which sub Znode under managed-ledgers had created. your deletion here will fail also.

If we use "force=true" to retry updatePartition, it seems no need to clean-up znode ?

@TakaHiR07 TakaHiR07 force-pushed the fix_topic_update_partitions branch from 868a6fa to e90def9 Compare April 21, 2022 07:33
@TakaHiR07
Copy link
Contributor Author

I have updated the code, PTAL, @gaozhangmin @Technoboy- @rdhabalia

  1. Maybe it is not good to call createSubscriptions after updatePartitionedTopicAsync, because we finally use getPartitionedTopicMetadata to judge whether update partition successful
  2. managed-ledger znode clean-up is actually not effective and not neccessary, since we can retry updatePartition by "force=true"
  3. It is better to add doc, illustrating that if updatePartition partially successful, we can use "force=true" to retry updatePartition again.

@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@Jason918
Copy link
Contributor

@TakaHiR07 Please rebase the master and resolve the conflicts.

@github-actions github-actions bot removed the Stale label Aug 28, 2022
@TakaHiR07
Copy link
Contributor Author

pr-17251 is the similar method to solve update partition problem. So close this one

@TakaHiR07 TakaHiR07 closed this Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants