Skip to content

Add schema propagation waits to ReplicationChangeSchemaIT #2973

@robfrank

Description

@robfrank

Overview

Add Awaitility-based waits for schema propagation in ReplicationChangeSchemaIT to prevent race conditions when testing distributed schema changes.

Part of Epic #2968

Current Issues

  • Schema changes tested immediately after creation
  • No wait for replication to replicas
  • Race conditions in distributed schema verification
  • Replication queue not verified before assertions

Improvements to Implement

1. Type Creation Wait

database.getSchema().createDocumentType("Order");

// Wait for schema to propagate
Awaitility.await("type creation propagation")
    .atMost(5, TimeUnit.SECONDS)
    .pollInterval(100, TimeUnit.MILLISECONDS)
    .until(() -> database.getSchema().existsType("Order"));

2. Property Creation Wait

database.getSchema().getType("Order").createProperty("id", Integer.class);

Awaitility.await("property creation propagation")
    .atMost(5, TimeUnit.SECONDS)
    .pollInterval(100, TimeUnit.MILLISECONDS)
    .until(() -> database.getSchema().getType("Order").existsProperty("id"));

3. Bucket Creation Wait

database.getSchema().createBucket("OrderBucket");

Awaitility.await("bucket creation propagation")
    .atMost(5, TimeUnit.SECONDS)
    .pollInterval(100, TimeUnit.MILLISECONDS)
    .until(() -> database.getSchema().existsBucket("OrderBucket"));

4. Replication Queue Verification (Leader)

// Wait for replication to complete on leader
Awaitility.await("leader replication queue drain")
    .atMost(10, TimeUnit.SECONDS)
    .pollInterval(200, TimeUnit.MILLISECONDS)
    .until(() -> getServer(0).getHA().getReplicationLog().getQueueSize() == 0);

5. Replication Queue Verification (All Replicas)

// Wait for queue to drain on all replicas
Awaitility.await("all replicas queue drain")
    .atMost(10, TimeUnit.SECONDS)
    .pollInterval(200, TimeUnit.MILLISECONDS)
    .until(() -> {
      for (int i = 1; i < getTotalServers(); i++) {
        if (getServer(i).getHA().getReplicationLog().getQueueSize() > 0) {
          return false;
        }
      }
      return true;
    });

6. File Persistence Verification

// Allow time for file system flush
Thread.sleep(100);

File schemaFile = new File(getDatabasePath(1), "schema.json");
assertThat(schemaFile).exists();

Layered Verification Strategy

For each schema operation:

  1. API Level: Schema change API call completes
  2. Memory Level: Schema exists in memory (Awaitility wait)
  3. Queue Level: Replication queue drains (Awaitility wait)
  4. Persistence Level: File system flush + verification

Validation

# Run test 20 times to verify schema propagation reliability
for i in {1..20}; do
  echo "Run $i/20"
  mvn test -pl server -Dtest=ReplicationChangeSchemaIT || echo "FAILED: Run $i"
done

Success Criteria

  • All schema changes have propagation waits
  • Queue drain verified before assertions
  • File persistence verified where applicable
  • Test passes at least 19/20 times (95% success rate)
  • No timing-related failures
  • All replicas have consistent schema

Expected Impact

Before:

  • Race conditions in schema assertions
  • Flaky failures due to replication lag
  • No verification of actual propagation

After:

  • Reliable schema propagation
  • Verified consistency across replicas
  • Clear failure diagnostics

Timeout Rationale

  • Schema operations: 5s (typically <1s in healthy cluster)
  • Queue drain: 10s (includes network + disk I/O)
  • File flush: 100ms (OS buffer flush time)

Time Estimate

45 minutes

Risk Level

MEDIUM - Changes test behavior, adds timeouts

Documentation

See PORTING_PLAN_IT_TEST_IMPROVEMENTS.md - Phase 4, Section 4.3 for detailed instructions.
See HA_TEST_RELIABILITY_ANALYSIS.md - Section on ReplicationChangeSchemaIT for analysis.

Related Issues

Part of Epic: #2968
Can be done in parallel with: HARandomCrashIT, HASplitBrainIT improvements

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions