Skip to content

Conversation

@nielsenko
Copy link
Collaborator

@nielsenko nielsenko commented Dec 1, 2025

Description

  • Fix multi-isolate server hanging when close() is called multiple times
  • Clear _children list before closing to make subsequent calls no-ops
  • Add tests for sequential and concurrent double-close scenarios
  • Refactor test helpers to reduce code duplication

Related Issues

Pre-Launch Checklist

  • This update focuses on a single feature or bug fix.
  • I have read and followed the Dart Style Guide and formatted the code using dart format.
  • I have referenced at least one issue this PR fixes or is related to.
  • I have updated/added relevant documentation (doc comments with ///), ensuring consistency with existing project documentation.
  • I have added new tests to verify the changes.
  • All existing and new tests pass successfully.
  • I have documented any breaking changes below.

Breaking Changes

  • No breaking changes.

Additional Notes

Investigation confirmed that HttpServer.close() and ServerSocket.close() are idempotent in dart:io, making the fix safe. The multi-isolate fix uses a synchronous copy-and-clear pattern that acts as a simple mutex for concurrent calls.

Summary by CodeRabbit

  • Bug Fixes

    • Improved graceful shutdown reliability in RelicServer by safely managing the closure process during concurrent operations.
  • Tests

    • Added comprehensive test coverage for server shutdown behavior, including in-flight request completion, repeated close calls, and multi-isolate configurations.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This PR refactors isolate synchronization syntax from Future.wait() to .wait property across examples and tests, modifies multi-isolate server close behavior to operate on a snapshot of children to prevent concurrent mutations, and introduces comprehensive graceful shutdown tests for single and multi-isolate server configurations.

Changes

Cohort / File(s) Summary
Future.wait() refactoring
example/advanced/multi_isolate.dart, test/isolated_object/isolated_object_evaluate_test.dart
Replaces Future.wait([...]) with [...].wait syntax in two locations to await spawned isolates and concurrent operations. No semantic change to error propagation or control flow.
Multi-isolate server close behavior
lib/src/relic_server.dart
Changes _MultiIsolateRelicServer.close() to operate on a snapshot of children by copying _children to a local list, clearing the original, and then closing the copied list. Decouples the close sequence from concurrent mutations to the mutable list.
Graceful shutdown test suite
test/relic_server_graceful_shutdown_test.dart
Adds comprehensive test cases covering in-flight request completion during server shutdown, repeated close calls (sequential and concurrent), and multi-isolate configurations. Includes utilities for signaling handlers and delayed request management.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40–60 minutes

  • lib/src/relic_server.dart: The snapshot copy logic in close() is a critical change addressing concurrent mutation during shutdown; requires careful verification of deadlock prevention and proper cleanup semantics.
  • test/relic_server_graceful_shutdown_test.dart: New comprehensive test file with multiple test cases and custom test utilities; review scope is large with various shutdown scenarios across single and multi-isolate configurations.
  • Heterogeneous changes: Mix of syntax refactoring (low complexity) and logic changes (higher complexity) across different file types (example, production, test) increases overall review burden despite some repetitive patterns.

Possibly related issues

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: Ensure close is idempotent' directly and concisely summarizes the main change: making the close() method idempotent to prevent hanging on multiple calls.
Description check ✅ Passed The description provides comprehensive details of the changes, clearly links to issue #293, completes all pre-launch checklist items, and includes relevant implementation notes about the synchronous copy-and-clear pattern.
Linked Issues check ✅ Passed The code changes fully address issue #293 by implementing idempotent close() behavior: _MultiIsolateRelicServer now copies and clears _children before closing, making subsequent calls no-ops, and comprehensive tests validate single and concurrent double-close scenarios.
Out of Scope Changes check ✅ Passed All changes are scoped to fixing the double-close hang: relic_server.dart implements the idempotent fix, test helpers are refactored for maintainability, example files show the new await pattern, and the test file adds comprehensive graceful shutdown validation—all supporting the stated objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nielsenko
Copy link
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@nielsenko nielsenko self-assigned this Dec 1, 2025
@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.00%. Comparing base (1f056a7) to head (15e6f97).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #295   +/-   ##
=======================================
  Coverage   91.99%   92.00%           
=======================================
  Files          97       97           
  Lines        3662     3664    +2     
  Branches     1881     1881           
=======================================
+ Hits         3369     3371    +2     
  Misses        293      293           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/src/relic_server.dart (1)

256-258: Consider consistent error handling for port after close.

After close() clears _children, calling port throws StateError: No element from List.first. In contrast, _RelicServer.port (line 83) throws StateError: Not bound.

For consistency, you could cache the port similarly to _RelicServer:

+  int? _port;
   @override
-  int get port => _children.first.port;
+  int get port => _port ?? (throw StateError('Not bound'));

And update close() to also clear _port:

   @override
   Future<void> close() async {
     final children = List.of(_children);
     _children.clear();
+    _port = null;
     await children.map((final c) => c.close()).wait;
   }

And set _port in mountAndStart:

   @override
   Future<void> mountAndStart(final Handler handler) async {
     await _children.map((final c) => c.mountAndStart(handler)).wait;
+    _port ??= _children.first.port;
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c75273d and 609c8d5.

📒 Files selected for processing (4)
  • example/advanced/multi_isolate.dart (1 hunks)
  • lib/src/relic_server.dart (1 hunks)
  • test/isolated_object/isolated_object_evaluate_test.dart (2 hunks)
  • test/relic_server_graceful_shutdown_test.dart (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.dart

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.dart: All Dart code must pass static analysis using dart analyze --fatal-infos with no issues
All Dart files must be formatted with dart format (CI enforces dart format --set-exit-if-changed .)

Files:

  • test/isolated_object/isolated_object_evaluate_test.dart
  • test/relic_server_graceful_shutdown_test.dart
  • lib/src/relic_server.dart
  • example/advanced/multi_isolate.dart
test/**/*.dart

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

test/**/*.dart: Tests should follow the Given-When-Then pattern in descriptions (flexible structuring allowed)
Use Arrange-Act-Assert pattern within test bodies
Provide clear, descriptive test titles; prefer single responsibility per test unless related assertions improve clarity
Place tests in the test/ directory mirroring the lib/ structure

Files:

  • test/isolated_object/isolated_object_evaluate_test.dart
  • test/relic_server_graceful_shutdown_test.dart
lib/**/*.dart

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

lib/**/*.dart: Use Uint8List for request/response bodies for performance; avoid List for body payloads
Use type-safe HTTP header parsing and validation when accessing headers
Use router with trie-based matching and symbol-based path parameters (e.g., #name, #age) for routing
Ensure WebSocket handling includes proper lifecycle management (e.g., ping/pong for connection health)

Files:

  • lib/src/relic_server.dart
🧠 Learnings (8)
📓 Common learnings
Learnt from: nielsenko
Repo: serverpod/relic PR: 48
File: example/example.dart:31-36
Timestamp: 2025-04-24T14:06:32.810Z
Learning: In the example code, `sleep()` is intentionally used instead of `await Future.delayed()` to simulate CPU-bound work that benefits from multiple isolates/cores. Using a blocking call demonstrates why multiple isolates are necessary, while an async approach would allow a single isolate to handle multiple requests concurrently, defeating the purpose of the multi-isolate example.
Learnt from: nielsenko
Repo: serverpod/relic PR: 48
File: lib/src/handler/handler.dart:59-67
Timestamp: 2025-04-25T07:39:38.915Z
Learning: Nielsenko prefers using switch statements with pattern matching over if statements when working with sealed classes in Dart, as they provide exhaustiveness checking at compile time and can be more concise.
📚 Learning: 2025-04-24T14:06:32.810Z
Learnt from: nielsenko
Repo: serverpod/relic PR: 48
File: example/example.dart:31-36
Timestamp: 2025-04-24T14:06:32.810Z
Learning: In the example code, `sleep()` is intentionally used instead of `await Future.delayed()` to simulate CPU-bound work that benefits from multiple isolates/cores. Using a blocking call demonstrates why multiple isolates are necessary, while an async approach would allow a single isolate to handle multiple requests concurrently, defeating the purpose of the multi-isolate example.

Applied to files:

  • test/isolated_object/isolated_object_evaluate_test.dart
  • test/relic_server_graceful_shutdown_test.dart
  • example/advanced/multi_isolate.dart
📚 Learning: 2025-10-09T16:21:09.310Z
Learnt from: CR
Repo: serverpod/relic PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-10-09T16:21:09.310Z
Learning: Applies to test/**/*.dart : Provide clear, descriptive test titles; prefer single responsibility per test unless related assertions improve clarity

Applied to files:

  • test/relic_server_graceful_shutdown_test.dart
📚 Learning: 2025-10-09T16:21:09.310Z
Learnt from: CR
Repo: serverpod/relic PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-10-09T16:21:09.310Z
Learning: Applies to lib/**/*.dart : Ensure WebSocket handling includes proper lifecycle management (e.g., ping/pong for connection health)

Applied to files:

  • test/relic_server_graceful_shutdown_test.dart
📚 Learning: 2025-04-24T04:14:12.943Z
Learnt from: nielsenko
Repo: serverpod/relic PR: 47
File: test/hijack/relic_hijack_test.dart:82-90
Timestamp: 2025-04-24T04:14:12.943Z
Learning: Tests within a single file in Dart's test package run sequentially, not concurrently, so global state for test resources within a file doesn't present race condition risks.

Applied to files:

  • test/relic_server_graceful_shutdown_test.dart
📚 Learning: 2025-10-22T11:25:39.264Z
Learnt from: nielsenko
Repo: serverpod/relic PR: 216
File: lib/src/router/relic_app.dart:47-49
Timestamp: 2025-10-22T11:25:39.264Z
Learning: In the serverpod/relic repository, validation of the `noOfIsolates` parameter should be handled in the `RelicServer` constructor (lib/src/relic_server.dart), not in `RelicApp.run` (lib/src/router/relic_app.dart).

Applied to files:

  • test/relic_server_graceful_shutdown_test.dart
📚 Learning: 2025-05-22T15:55:46.307Z
Learnt from: nielsenko
Repo: serverpod/relic PR: 79
File: benchmark/benchmark.dart:0-0
Timestamp: 2025-05-22T15:55:46.307Z
Learning: When working with Dart's IOSink (e.g., from File.openWrite()), always ensure to properly close it when done to flush any buffered data and release system resources. Without explicit closing, the last buffer may not be flushed, potentially resulting in data loss.

Applied to files:

  • lib/src/relic_server.dart
📚 Learning: 2025-10-22T11:21:50.149Z
Learnt from: nielsenko
Repo: serverpod/relic PR: 216
File: lib/src/isolated_object.dart:6-7
Timestamp: 2025-10-22T11:21:50.149Z
Learning: In Dart, closures CAN be sent between isolates when the isolates share the same code (e.g., using Isolate.spawn or Isolate.run). The closure and any captured state must be sendable. This is officially supported and documented in Dart's isolate documentation. Passing function objects like `dynamic Function(T)` between isolates is a valid pattern when the captured state is sendable.

Applied to files:

  • example/advanced/multi_isolate.dart
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: Run Unit Tests (windows-latest, 3.7.0, downgrade)
  • GitHub Check: Run Unit Tests (windows-latest, stable, downgrade)
  • GitHub Check: Run Unit Tests (macos-latest, beta, upgrade)
  • GitHub Check: Run Unit Tests (macos-latest, stable, downgrade)
  • GitHub Check: Run Unit Tests (macos-latest, beta, downgrade)
  • GitHub Check: Run Unit Tests (macos-latest, 3.7.0, downgrade)
  • GitHub Check: Run Unit Tests (windows-latest, beta, downgrade)
  • GitHub Check: Run Unit Tests (macos-latest, stable, upgrade)
  • GitHub Check: Run Unit Tests (macos-latest, 3.7.0, upgrade)
  • GitHub Check: Run Unit Tests (ubuntu-latest, stable, upgrade)
  • GitHub Check: Run Unit Tests (windows-latest, 3.7.0, upgrade)
  • GitHub Check: Run Unit Tests (ubuntu-latest, stable, downgrade)
  • GitHub Check: Run Unit Tests (windows-latest, beta, upgrade)
  • GitHub Check: Run Unit Tests (windows-latest, stable, upgrade)
  • GitHub Check: Verify Build (ubuntu-latest, stable)
🔇 Additional comments (10)
test/isolated_object/isolated_object_evaluate_test.dart (2)

57-57: LGTM!

Using futures.wait (the Dart 3.0+ FutureExtensions on List<Future<T>>) is a cleaner, more idiomatic approach than Future.wait(futures). The behavior is equivalent here since no errors are expected.


193-193: LGTM!

Consistent use of the .wait extension pattern. The test correctly validates that all 100 operations complete successfully.

lib/src/relic_server.dart (1)

230-234: Excellent fix for the idempotent close behavior.

The copy-and-clear pattern correctly ensures that:

  1. Concurrent calls to close() will see an empty _children list after the first caller clears it
  2. The first caller completes closing all children from its snapshot
  3. Subsequent calls await an empty list, completing immediately as no-ops

This synchronous copy-and-clear before any async operations effectively serializes the close semantics.

example/advanced/multi_isolate.dart (1)

15-20: LGTM!

The refactor to use List.generate(...).wait is cleaner and consistent with the .wait extension pattern used throughout this PR. The isolate spawning logic and debug naming remain unchanged.

test/relic_server_graceful_shutdown_test.dart (6)

14-55: Well-designed test helpers for controlled shutdown testing.

The separation between _createSignalingHandler (for single-isolate tests with precise synchronization) and _createDelayedHandler (for multi-isolate tests where Completers can't cross boundaries) is a good design choice.

Note: The 50ms delay in _startDelayedInFlightRequests (line 52) is timing-dependent. If tests become flaky in slow CI environments, consider increasing this value. The 300ms default request delay provides reasonable margin.


98-144: Comprehensive test for graceful shutdown with in-flight requests.

The test correctly:

  1. Starts requests and waits for them to begin processing
  2. Initiates server close while requests are in-flight
  3. Allows requests to complete
  4. Verifies all responses are successful

Good use of destructuring with (:responseFutures, :canComplete) pattern.


146-197: Good handling of timing-dependent behavior.

The flexible assertion at line 193 appropriately handles the race between the new request attempt and socket closure. The comment at lines 189-192 clearly explains why exact behavior varies.


199-231: Direct test coverage for issue #293.

Excellent test cases that validate the core fix:

  • Sequential double-close: Uses expectLater(server.close(), completes) to detect hangs
  • Concurrent double-close: Tests the race condition scenario with in-flight requests

The issue references in comments provide good traceability.


234-252: Defensive tearDown handling for multi-isolate tests.

The serverClosed flag and try-catch in tearDown is a good defensive pattern. Interestingly, once this PR's fix is merged, the double-close protection would make this simpler - but keeping it doesn't hurt and documents the expected behavior.

Tests appropriately use noOfIsolates: 2 to validate multi-isolate shutdown.


254-303: Good test coverage for multi-isolate shutdown scenarios.

The multi-isolate tests appropriately:

  1. Mirror the single-isolate test structure for consistency
  2. Use the delay-based helper since Completers can't cross isolate boundaries
  3. Cover both sequential and concurrent double-close scenarios

This ensures the snapshot-and-clear fix in _MultiIsolateRelicServer.close() works correctly.

@nielsenko nielsenko requested a review from a team December 1, 2025 17:22
@nielsenko nielsenko marked this pull request as ready for review December 1, 2025 17:23
Copy link
Contributor

@SandPod SandPod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

@nielsenko
Copy link
Collaborator Author

Rebased on main after merge of #292

@nielsenko nielsenko merged commit ba3f3a9 into serverpod:main Dec 3, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix: Calling close() twice hangs multi-isolate servers

2 participants