-
-
Notifications
You must be signed in to change notification settings - Fork 86
Description
Overview
Add comprehensive documentation, extract timeout constants, and prepare commits for the IT test improvements.
Part of Epic #2968
Prerequisites
- ✅ All implementation issues completed
- ✅ Testing and validation passed
Tasks
1. Add JavaDoc to Complex Logic (30 min)
Awaitility Conditions:
Document timeout rationale for complex conditions.
/**
* Wait for all replicas to have consistent schema after type creation.
* Timeout: 5 seconds (schema operations are typically <1s in healthy cluster)
*/
Awaitility.await("schema propagation")
.atMost(5, TimeUnit.SECONDS)
.pollInterval(100, TimeUnit.MILLISECONDS)
.until(() -> database.getSchema().existsType("Order"));Synchronization Blocks:
Explain why synchronization is needed.
/**
* Double-checked locking to ensure split triggers exactly once.
* First check (outside sync) avoids lock contention in common case.
* Second check (inside sync) prevents race between multiple threads.
*/
if (messagesSent >= 20 && !split) {
synchronized (HASplitBrainIT.this) {
if (split) return;
split = true;
// ...
}
}Files to Document:
- HARandomCrashIT.java - Timer thread, retry logic, exponential backoff
- HASplitBrainIT.java - Synchronization rationale, state machine
- ReplicationChangeSchemaIT.java - Layered verification, timeout choices
2. Extract Timeout Constants (20 min)
Create Base Class or Constants Interface:
/**
* Common timeout constants for HA integration tests.
*/
public interface HATestTimeouts {
/** Schema operations (type, property, bucket creation) */
Duration SCHEMA_PROPAGATION_TIMEOUT = Duration.ofSeconds(5);
/** Cluster-wide consensus operations */
Duration CLUSTER_STABILIZATION_TIMEOUT = Duration.ofSeconds(60);
/** Server lifecycle operations */
Duration SERVER_SHUTDOWN_TIMEOUT = Duration.ofSeconds(30);
Duration SERVER_STARTUP_TIMEOUT = Duration.ofSeconds(30);
/** Replication queue draining */
Duration REPLICATION_QUEUE_DRAIN_TIMEOUT = Duration.ofSeconds(10);
/** Replica reconnection after network partition */
Duration REPLICA_RECONNECTION_TIMEOUT = Duration.ofSeconds(30);
}Update Tests:
- Replace hardcoded timeouts with constants
- Ensure consistency across all HA tests
3. Create Logical Commits (20 min)
Follow conventional commits format, matching source branch structure:
Commit 1: Production code modernization
feat: modernize date handling with Java 21 pattern matching
Refactor DateUtils and BinaryTypes to use pattern matching switch
expressions, reducing cyclomatic complexity by 43% while improving
readability and null safety.
Changes:
- Replace if-else chains with exhaustive switch expressions
- Add explicit null handling with 'case null'
- Migrate from wildcard to explicit imports
- Apply transformation pattern to 6 methods in DateUtils
Ported from: claude/fix-failing-it-tests-0176P1zKUgLUsKhvAvGkQkbN
Part of: #2968
Commit 2: Test bug fix
fix: correct ResultSet iteration bug in RemoteDateIT
Fix critical bug where resultSet.next() was called three times
on the same ResultSet, causing test failures or incorrect assertions.
Refactor test to extend BaseGraphServerTest for better maintainability.
Changes:
- Store resultSet.next() result for reuse
- Extend BaseGraphServerTest (34% code reduction)
- Add try-with-resources for proper resource management
- Remove debug System.out.println statements
Ported from: claude/fix-failing-it-tests-0176P1zKUgLUsKhvAvGkQkbN
Fixes critical bug in source branch
Part of: #2968
Commit 3-5: HA test improvements
fix: improve HA test reliability with Awaitility and synchronization
Replace busy-wait loops with bounded Awaitility timeouts, add
thread-safe state management, and implement exponential backoff
to eliminate test flakiness and infinite loops.
Changes:
- HARandomCrashIT: Awaitility timeouts, restart verification, exponential backoff
- HASplitBrainIT: Thread-safe state, double-checked locking, cluster stabilization
- ReplicationChangeSchemaIT: Schema propagation waits, queue verification
Expected impact: Reduce test flakiness from 15-20% to <1%
Ported from: claude/fix-failing-it-tests-0176P1zKUgLUsKhvAvGkQkbN
Part of: #2968
4. Update Test Class Documentation
Add class-level JavaDoc explaining test purpose and patterns:
Example:
/**
* Integration test for High Availability random crash scenarios (chaos engineering).
*
* <p>This test simulates random server crashes during continuous operation to verify
* cluster resilience and automatic recovery. Uses bounded randomness and exponential
* backoff to ensure reliable testing without infinite loops.
*
* <p>Key patterns:
* <ul>
* <li>Awaitility for bounded waits (30s timeouts)</li>
* <li>Exponential backoff: min(1000 * (retry + 1), 5000)</li>
* <li>Daemon timer thread prevents JVM hangs</li>
* <li>Explicit restart verification with retries</li>
* </ul>
*
* @see HATestTimeouts for timeout constants
*/
@Test
public class HARandomCrashIT extends BaseGraphServerTest {
// ...
}5. Code Cleanup Checklist
Final review before PR:
- No debug code (System.out.println, printStackTrace)
- All resources use try-with-resources where applicable
- Consistent formatting (run formatter)
- No commented-out code
- All TODOs addressed or documented
- Import statements organized
- No unused imports
- Consistent naming conventions
6. Update Project Documentation
If needed, update project-level documentation:
- Update CLAUDE.md if new patterns established
- Note best practices for future HA tests
- Document timeout selection guidelines
Validation
# Verify commits are clean and atomic
git log --oneline main..HEAD
# Verify no accidental files included
git diff main --stat
# Final build check
mvn clean installSuccess Criteria
- All complex logic documented with JavaDoc
- Timeout constants extracted and used consistently
- Logical, atomic commits created
- Test class documentation updated
- Code cleanup checklist completed
- Build succeeds with clean git status
Time Estimate
70 minutes total:
- JavaDoc: 30 min
- Constants extraction: 20 min
- Commits: 20 min
- Updates: Variable (as needed)
Risk Level
LOW - Documentation and cleanup, no functional changes
Documentation
See PORTING_PLAN_IT_TEST_IMPROVEMENTS.md - Phase 6 for detailed instructions.
Next Steps
After completion:
- Create pull request (next issue)
- Request code review
- Monitor CI/CD results
Related Issues
Part of Epic: #2968
Depends on: Testing and validation issue
Blocks: None (final implementation step before PR)