Skip to content

Conversation

@mattisonchao
Copy link
Member

@mattisonchao mattisonchao commented Jan 31, 2023

Fixes #19083
Fixes #11866

Motivation

Currently, the force delete namespace operation has the following steps:

  1. Do pre-check
  2. Get full of topics
  3. Mark the namespace as deleted to avoid creating the topic again by client lookup or reconnect.
  4. Delete all of the user-created topics
  5. Delete all system topics.
  6. Delete namespace event topics.
  7. clean up namespace metadata and resources

A race condition will make step 2 unable to get full of topics. e.g:

- Thread A Thread B
time 1 Got full of topics Trying to create a new topic and passed the deleted check
time 2 Mark deleted Do other checks
time 3 Do the rest of steps 4,5,6 Created managed ledger and has persistent info to metadata
time 4 Step 7, Got exception Directory not empty for /managed-ledgers/test-tenant/test-ns2/persistent Do the rest work
time 5 Return the exception Topic created

Modifications

Deleting topic xxxx because local cluster is not part of global namespace repl list

  • Remove namespacePolicies.deleted logic to ensure the new topic will not be deleted by PersistenTopic#checkReplication.
  • Add retry to resolve some topics that were not created successfully(in metadata) during the deletion.
  • Add a test case to test if the topic object is left behind.

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 31, 2023
@mattisonchao mattisonchao requested a review from Jason918 January 31, 2023 14:04
@mattisonchao mattisonchao self-assigned this Jan 31, 2023
@mattisonchao mattisonchao added this to the 3.0.0 milestone Jan 31, 2023
@eolivelli
Copy link
Contributor

This patch is related
#19097

@dlg99 @aymkhalil PTAL

@eolivelli
Copy link
Contributor

I am not sure that this holds "Always assume the existence of an event topic "

@dlg99
Copy link
Contributor

dlg99 commented Jan 31, 2023

@mattisonchao can this be done slightly differently:
swap order of steps 2 and 3 (mark deleted first, do the rest, unmark on error).
In combo with #19097 this should work good enough.

There is still a chance that a topic creation passed the check before ns deletion started, and completed after the ns is marked deleted. I don't think we can solve it easily, this will require either distributed locking or serialization of ns/topic metadata operations e.g. via global system topic.

@mattisonchao
Copy link
Member Author

@dlg99

swap order of steps 2 and 3 (mark deleted first, do the rest, unmark on error).
In combo with #19097 this should work good enough.

There is still a chance that a topic creation passed the check before ns deletion started, and completed after the ns is marked deleted. I don't think we can solve it easily, this will require either distributed locking or serialization of ns/topic metadata operations e.g. via global system topic.

As you said. the topic still has the chance to go into the race condition.

I don't think we can solve it easily,

We don't need to solve the user-created topic but we should make sure to clean up the system topic.

@dlg99
Copy link
Contributor

dlg99 commented Feb 1, 2023

@mattisonchao what's happening in this PR as I understand:

  1. delete all topics + all system topics except for the events one.
  2. Then delete the events topic
  3. but that brings in some problems with handling of ns.deleted metadata (because of rpc?) so let's remove that logic in some places.

This does not sound like a comprehensive solution (can't topic creation happen after p.2 and still cause namespace deletion to fail?) plus it breaks contract for handling of ns.deleted as @eolivelli moted.

On another note, events topic is being deleted using admin.topics().deletePartitionedTopicAsync while for other partitioned topics we use namespaceResources().getPartitionedTopicResources().deletePartitionedTopicAsync (in internalDeletePartitionedTopicsAsync). Is it done on purpose?

I'd leave ns.deleted handling logic as it is, set ns.deleted for namespace earlier before actually listing topics (seems to be the right thing to do logically).
Do the rest of the steps with retries/backoff - ns is marked as deleted so topic creations should settle at some point. List topics again and delete again. then delete the namespace metadata.

@mattisonchao
Copy link
Member Author

mattisonchao commented Feb 1, 2023

I'd leave ns.deleted handling logic as it is, set ns.deleted for namespace earlier before actually listing topics (seems to be the right thing to do logically).
Do the rest of the steps with retries/backoff - ns is marked as deleted so topic creations should settle at some point. List topics again and delete again. then delete the namespace metadata.

Yes, that is what I'm changing right now. I would move the markDelete before listing topics and handling the duplicate deletion exception.

@mattisonchao mattisonchao marked this pull request as draft February 1, 2023 01:08
@mattisonchao mattisonchao marked this pull request as ready for review February 1, 2023 09:42
@mattisonchao
Copy link
Member Author

mattisonchao commented Feb 1, 2023

@eolivelli @dlg99 Please review it again. I've changed the logic to retry to help delete in-flight topics.

@mattisonchao mattisonchao changed the title [fix][broker] Fix delete namespace fail by a race condition [fix][broker] Fix delete namespace fail by a In-flight topic Feb 1, 2023
@mattisonchao mattisonchao reopened this Feb 1, 2023
RetryUtil.retryAsynchronously(() -> internalDeleteNamespaceAsync0(force),
new BackoffBuilder()
.setInitialTime(200, TimeUnit.MILLISECONDS)
.setMandatoryStop(15, TimeUnit.SECONDS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is good to set a time based limit, also because a namespace may have thousand of topics and it may really take more that 15 seconds.

we should set a value in the order of minutes (10 minutes ?)
and we should log something at every trial, this way from the logs it will be clear that something is going on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if the user gives up waiting and then it issues again the same command while the backoff is still running ? Ideally everything should work well and the second execution should wait for the previous execution to complete. The expectation for the user is that when the command completes the namespace is deleted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @eolivelli
I am trying to use the number of retrying to instead back off.

what happens if the user gives up waiting and then it issues again the same command while the backoff is still running ? Ideally everything should work well and the second execution should wait for the previous execution to complete. The expectation for the user is that when the command completes the namespace is deleted.

For this problem, we can give some kind of the same operation a distributed lock to avoid calling the same operation concurrently. I can send the discussion to the mailing list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distributed locks are very expensive and in any case you will have to deal with timeouts.
We should make the operations idempotent or chain them.
if a new operation comes and the deletion is already in progress we must wait for the result of the pending operation

Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming you address Enrico's comments + CI passes (currently "You have 1 Checkstyle violation")

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic is broker

please check my comments

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Technoboy- Technoboy- merged commit 3855585 into apache:master Feb 17, 2023
@mattisonchao mattisonchao deleted the fix/admin/delete_namespace branch February 17, 2023 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

5 participants