Skip to content

Conversation

@abumarjikar
Copy link
Contributor

@abumarjikar abumarjikar commented Sep 23, 2025

https://issues.apache.org/jira/browse/SOLR-17712

Description

This PR targets to Deprecate CommonAdminParams.waitForFinalState and make it true default

Solution

Please provide a short description of the approach taken to implement your solution.

Tests

Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@github-actions github-actions bot added documentation Improvements or additions to documentation client:solrj tests cat:cloud cat:api labels Sep 23, 2025
@abumarjikar
Copy link
Contributor Author

@dsmiley Changes as per the review.

@abumarjikar abumarjikar requested a review from dsmiley September 25, 2025 12:20
Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; excellent! I'll merge tomorrow night at the earliest.

@dsmiley
Copy link
Contributor

dsmiley commented Sep 28, 2025

Looks like there are test failures. I suggest finding the one that looks the simplest and look into it further. If you need help, LMK.

@abumarjikar
Copy link
Contributor Author

@dsmiley Mostly all the watchers are failing to complete or to fulfil the replica requirement. I might need help to which watcher to utilise in this case.

if (waitForFinalState) {
SolrCloseableLatch latch =
new SolrCloseableLatch(totalReplicas, ccc.getCloseableToLatchOn());
ActiveReplicaWatcher watcher =
new ActiveReplicaWatcher(
collectionName,
null,
createReplicas.stream()
.map(createReplica -> createReplica.coreName)
.collect(Collectors.toList()),
latch);
try {
zkStateReader.registerCollectionStateWatcher(collectionName, watcher);
runnable.run();
if (!latch.await(timeout, TimeUnit.SECONDS)) {
throw new SolrException(
SolrException.ErrorCode.SERVER_ERROR,
"Timeout waiting " + timeout + " seconds for replica to become active.");
}
} finally {
zkStateReader.removeCollectionStateWatcher(collectionName, watcher);
}
}

This is from addreplicacmd.

@abumarjikar abumarjikar requested a review from dsmiley September 30, 2025 05:41
@dsmiley
Copy link
Contributor

dsmiley commented Oct 5, 2025

Should be deprecated: org.apache.solr.client.solrj.request.CollectionAdminRequest.AsyncCollectionAdminRequest#setWaitForFinalState

@dsmiley
Copy link
Contributor

dsmiley commented Oct 6, 2025

I debugged one test -- TestPullReplica. The test intentionally created a replica that would have issues becoming fully active. That aspect was easy -- switch from process to processAsync so the test doesn't hang at this step. I made that trivial change. But the very next thing it did was try to set a replica property on it. That blocks on the lock for the replica's creation. As an experiment, I made ADDREPLICAPROP and DELETEREPLICAPROP use LockLevel.NONE in CommonParams. The test passed. I wonder if admin command internals should have an optional separate phase for waiting for the "final state". Also BTW it was helpful in debugging to disable the Overseer.

Copy link
Contributor

@dsmiley dsmiley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent half the day investigating / fixing what I think is the last of the difficult tests; a collection restore test. What remains are a handful of tests that just check messages and need to be updated simply (I assume merely by looking at test method names). @abumarjikar could you handle those please?

The lock matter needs to be discussed more... I'll at-mention Houston for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quite a few SolrCloud commands directly call AddReplicaCmd...

Comment on lines +156 to +157
if (collectionState.getZNodeVersion() == lastZkVersion
&& Objects.equals(lastPrs, collectionState.getPerReplicaStates())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRS wasn't working with waitForFinalState until this

RequestStatusState state = RequestStatusState.NOT_FOUND;
while (System.nanoTime() < finishTime) {
state = this.process(client).getRequestStatus();
RequestStatusResponse response = this.process(client);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted this on its own line so that in a debugger we can easily see what the response payload has

Comment on lines 114 to 115
ADDREPLICAPROP(true, LockLevel.NONE), // nocommit discuss
DELETEREPLICAPROP(true, LockLevel.NONE), // nocommit discuss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HoustonPutman I wonder what your opinion is here. We've got one test org.apache.solr.cloud.TestPullReplica#testSkipLeaderRecoveryProperty that purposefully adds a replica that isn't happy and then sets a property (SKIP_LEADER_RECOVERY_PROP) that leads to it being happy. I removed the lock level, which made the test pass so that it doesn't block on addReplica's happiness (final state). That's not my reasoning to remove the lock level... I'm wondering it it's harmless to let replica properties be set on any replica at any time?

Maybe ideally the waitForFinalState should happen outside the LockLevel but that would be tricky as it amounts to a design change to command processing, giving commands a final stage outside the lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed in dev list. I'm inclined to have these be NONE. And add an upgrade ref guide note about BALANCESHARDSUNIQUE to advise not to change the property during running that command (duh).
For that matter... no reason not to do likewise for manipulating a collection property.

Please enter the commit message for your changes. Lines starting
…ate_default_true' into SOLR_17712_remove_waitForFinalState_default_true
…tForFinalState_default_true

# Conflicts:
#	solr/CHANGES.txt
@abumarjikar abumarjikar requested a review from dsmiley October 9, 2025 08:28
@abumarjikar
Copy link
Contributor Author

Hey @dsmiley I have fixed the remaining test cases

for (Pair<CollectionAction, List<String>> operation : operations) {
final Lock lock = session.lock(operation.first(), operation.second());
if (lock != null) {
if (lock != null && !lock.equals(LockTree.FREELOCK)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what's going on here? (and with orderOfExecutionChanges)

@dsmiley
Copy link
Contributor

dsmiley commented Oct 20, 2025

I think a sync from main is in order

@abumarjikar abumarjikar requested a review from dsmiley October 23, 2025 06:24
@github-actions
Copy link

This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the dev@solr.apache.org mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cat:api cat:cloud documentation Improvements or additions to documentation stale PR not updated in 60 days tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants