-
-
Notifications
You must be signed in to change notification settings - Fork 86
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Overview
Add Awaitility-based waits for schema propagation in ReplicationChangeSchemaIT to prevent race conditions when testing distributed schema changes.
Part of Epic #2968
Current Issues
- Schema changes tested immediately after creation
- No wait for replication to replicas
- Race conditions in distributed schema verification
- Replication queue not verified before assertions
Improvements to Implement
1. Type Creation Wait
database.getSchema().createDocumentType("Order");
// Wait for schema to propagate
Awaitility.await("type creation propagation")
.atMost(5, TimeUnit.SECONDS)
.pollInterval(100, TimeUnit.MILLISECONDS)
.until(() -> database.getSchema().existsType("Order"));2. Property Creation Wait
database.getSchema().getType("Order").createProperty("id", Integer.class);
Awaitility.await("property creation propagation")
.atMost(5, TimeUnit.SECONDS)
.pollInterval(100, TimeUnit.MILLISECONDS)
.until(() -> database.getSchema().getType("Order").existsProperty("id"));3. Bucket Creation Wait
database.getSchema().createBucket("OrderBucket");
Awaitility.await("bucket creation propagation")
.atMost(5, TimeUnit.SECONDS)
.pollInterval(100, TimeUnit.MILLISECONDS)
.until(() -> database.getSchema().existsBucket("OrderBucket"));4. Replication Queue Verification (Leader)
// Wait for replication to complete on leader
Awaitility.await("leader replication queue drain")
.atMost(10, TimeUnit.SECONDS)
.pollInterval(200, TimeUnit.MILLISECONDS)
.until(() -> getServer(0).getHA().getReplicationLog().getQueueSize() == 0);5. Replication Queue Verification (All Replicas)
// Wait for queue to drain on all replicas
Awaitility.await("all replicas queue drain")
.atMost(10, TimeUnit.SECONDS)
.pollInterval(200, TimeUnit.MILLISECONDS)
.until(() -> {
for (int i = 1; i < getTotalServers(); i++) {
if (getServer(i).getHA().getReplicationLog().getQueueSize() > 0) {
return false;
}
}
return true;
});6. File Persistence Verification
// Allow time for file system flush
Thread.sleep(100);
File schemaFile = new File(getDatabasePath(1), "schema.json");
assertThat(schemaFile).exists();Layered Verification Strategy
For each schema operation:
- API Level: Schema change API call completes
- Memory Level: Schema exists in memory (Awaitility wait)
- Queue Level: Replication queue drains (Awaitility wait)
- Persistence Level: File system flush + verification
Validation
# Run test 20 times to verify schema propagation reliability
for i in {1..20}; do
echo "Run $i/20"
mvn test -pl server -Dtest=ReplicationChangeSchemaIT || echo "FAILED: Run $i"
doneSuccess Criteria
- All schema changes have propagation waits
- Queue drain verified before assertions
- File persistence verified where applicable
- Test passes at least 19/20 times (95% success rate)
- No timing-related failures
- All replicas have consistent schema
Expected Impact
Before:
- Race conditions in schema assertions
- Flaky failures due to replication lag
- No verification of actual propagation
After:
- Reliable schema propagation
- Verified consistency across replicas
- Clear failure diagnostics
Timeout Rationale
- Schema operations: 5s (typically <1s in healthy cluster)
- Queue drain: 10s (includes network + disk I/O)
- File flush: 100ms (OS buffer flush time)
Time Estimate
45 minutes
Risk Level
MEDIUM - Changes test behavior, adds timeouts
Documentation
See PORTING_PLAN_IT_TEST_IMPROVEMENTS.md - Phase 4, Section 4.3 for detailed instructions.
See HA_TEST_RELIABILITY_ANALYSIS.md - Section on ReplicationChangeSchemaIT for analysis.
Related Issues
Part of Epic: #2968
Can be done in parallel with: HARandomCrashIT, HASplitBrainIT improvements
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request