Skip to content

Work on async issues#11

Open
bsutton wants to merge 10 commits intoopen-runtime:mainfrom
onepub-dev:main
Open

Work on async issues#11
bsutton wants to merge 10 commits intoopen-runtime:mainfrom
onepub-dev:main

Conversation

@bsutton
Copy link

@bsutton bsutton commented May 23, 2024

As mentioned I think the way forward is to introduced a new guardSync method and fix the existing guard method so that it is fully async.

At the moment I've stepped back a little as there are a number of issues that I think need to be resolved before we try to fix the async issue.

I've added a linter to the project which has shown up a number of issues that I think need to be fixed.
In particular there is quite a few 'unnecessary statement' errors that look like they are logic errors.

Are you able to have a look at these, as I'm not always certain what you intended.

Additionally some of the code such as the below completer usage, could be simplified by a simple try/catch block.
I'm guessing this was done try and handle sync and async code which just isn't going to be possible so there is an opportunity here to simplify the code as we go.

from the execution_call.dart file.

 _returned is Future<R>
          ? _returned
              .then(completer.complete)
              // catch the error on returnable
              .catchError((e, StackTrace trace) => (this
                    .._error = ExecutionCallErrors<R, E>(anticipated: e is E ? e : null, unknown: e is! E && e is Object ? e : null, trace: trace, completer: completer))
                  .completer)
              // when complete successful is true if _error is not set and completer is completed or the completer is completed with an error and successful is set to the opposite of isCompleted which is false
              .whenComplete(() => _successful = completer.isCompleted && !LatePropertyAssigned<ExecutionCallErrors<R, E>>(() => _error) ||
                  !(completer..completeError((_error.anticipated.get ?? _error.unknown.get)!, _error.trace.get)).isCompleted)
          : _successful = (completer..complete(_returned)).isCompleted;

This is probably more of what I'm used to but I do find the functional approach to the guard method somewhat cludgey.

With the current approach we have:

    final execution = runtime.ExecutionCall<FutureOr<void>, DCliException>(
      callable: () => action(),
      safe: true,
    );

    try {
         final _execution = runtime.NamedLock.guard<Future<T>, DCliException>(
        name: _nameWithSuffix,
        execution: execution,
        waiting: waiting,
        timeout: _timeout,
      );

      if (_execution.successful.isSet && _execution.successful.get!) {
        await _execution.completer.future;
        await _execution.returned;
      } else if (_execution.error.isSet) {
        await _execution.error.get?.rethrow_();
      }
    } on DCliException catch (e) {
      throw e..stackTrace = callingStackTrace;
    } on Exception catch (e) {
      throw DCliException.from(e, callingStackTrace);
    }
     return _exectuion.returned;

if the guard method simply threw then we could have this:

  T result;
    try {
        result  = await runtime.NamedLock.guard(
            name: _nameWithSuffix,
            execution: action,
            waiting: waiting,
            timeout: _timeout,
      );
     } on DCliException catch (e) {
      throw e..stackTrace = callingStackTrace;
    } on Exception catch (e) {
      throw DCliException.from(e, callingStackTrace);
    }
   return result;   

@bsutton
Copy link
Author

bsutton commented Jun 23, 2024

Just wondering if you have some time to look at this, as it is currently a blocker for dcli.

@tsavo-at-pieces
Copy link
Contributor

Fork Maintainer Review (Concurrency Primitives)

Summary

This PR does several things at once: (1) adds lint_hard as a dev dependency and creates analysis_options.yaml, (2) splits UnixLock and WindowsLock out of named_lock.dart into their own files, (3) adds type annotations to fix linter warnings (e.g. String toString(), StackTrace trace), (4) adds @override annotations, (5) refactors the NamedLock.guard method by extracting _wait() and _execute() helper methods, and (6) adds a new async test. The PR description proposes introducing a guardSync method and fixing the existing guard for fully async use.

Correctness

Critical issue -- behavioral change in the guard loop:

The original guard() loop tried lock.lock() at the top of each iteration, and only entered the wait/retry path if locking failed. The refactored version changes the control flow:

while (!locked) {
  locked = lock.lock();
  if (locked) {
    _execute(...);
  } else {
    _wait(...);
    locked = lock.lock();  // <-- second lock attempt added here
  }
  ...
}

After _wait() returns, there is a second call to lock.lock() at the end of the else block. If that second lock succeeds, the loop condition !locked becomes false and the loop exits without ever calling _execute(). This means the guarded code would never run even though the lock was acquired. This is a correctness bug in a concurrency primitive and must be fixed before merge.

_wait() parameters are passed by value, not by reference:

The extracted _wait() method receives _sleep, _attempt, and waiting as parameters. In Dart, these are passed by value. The mutations inside _wait() (incrementing _attempt, increasing _sleep, setting waiting = null) have no effect on the caller's variables. This means:

  • The sleep duration never increases (always stays at the initial 2ms)
  • The attempt counter never increments
  • The waiting message would print on every retry, not just the first

This is a regression from the original inline code where these mutations correctly affected the loop state.

_execute() type signature is extremely verbose but functionally fine. The fully expanded generic type signature on the lock parameter in _execute() works but is unnecessarily complex -- the original inline code was cleaner.

The new async test (named_lock_async_test.dart) has a structural problem:

await NamedLock.guard<void, IntentionalTestException>(
    name: 'async test', execution: execution);

NamedLock.guard is a synchronous method that returns ExecutionCall, not a Future. The await here is a no-op (awaiting a non-Future just returns it). The test asserts sequential ordering (line 1, line 2, line 3) but NamedLock.guard calls execution.execute() synchronously, which means the async callback's Future.delayed will not have completed by the time guard returns. The test passes only because of the 500ms Future.delayed at the end, which gives the async work time to complete. This doesn't actually validate correct async behavior -- it validates that async work eventually completes if you wait long enough.

Platform Compatibility

  • File split (unix_lock.dart, windows_lock.dart) is a clean structural change and maintains existing platform checks.
  • The analysis_options.yaml has include: placed after the analyzer: block -- include: should typically be at the top of the file for correct precedence. Also missing a trailing newline.

Recommendation

HOLD -- Do not merge as-is. The two control flow bugs in the guard loop (double lock() call causing skip of _execute(), and pass-by-value parameters in _wait()) are correctness issues in a concurrency primitive. These need to be fixed and the async test needs to be rethought.

I agree with the general direction of the PR description (separating sync vs async guard paths), but the current code has regressions that need to be addressed first. Consider:

  1. Fix the guard loop so _execute() is always called when the lock is acquired.
  2. Either keep the wait/retry state inline (as it was), or pass a mutable state object to _wait().
  3. For the async test, either make guard genuinely async-aware or document that it is synchronous-only and add a separate guardAsync method as proposed in the PR description.

@tsavo-at-pieces
Copy link
Contributor

Concurrency Safety Review

Race Condition Analysis

CRITICAL: New race condition introduced in the _wait / retry flow.

The original guard() method had a clear structure: while (!locked) { locked = lock.lock(); if (locked) { execute+cleanup } else { sleep+retry } }. The refactored code changes this to:

while (!locked) {
  locked = lock.lock();
  if (locked) {
    _execute(verbose, locked, execution, lock);
  } else {
    _wait(verbose, locked, _sleep, _attempt, waiting, now, timeout);
    locked = lock.lock();  // <-- Second lock attempt OUTSIDE the while condition
  }
  // error check runs here regardless
}

This introduces a double-lock attempt per iteration in the failure path: the first lock.lock() fails, then _wait() sleeps, then lock.lock() is called again -- but if this second lock succeeds, the loop continues to the error check but skips the _execute() block entirely because the if (locked) was already evaluated as false. The code then loops back to while (!locked) which is now true from the second lock.lock() call, but lock.lock() is called AGAIN at the top, which will fail because the lock is already held by this process (the semaphore counter shows process count > 0). This means:

  1. The lock is acquired but _execute() is never called for that acquisition.
  2. On the next iteration, lock.lock() at the top returns false (already locked by this process), enters the else branch, calls _wait() then lock.lock() again -- creating an infinite loop or timeout.

Additionally: The _wait() method receives _sleep, _attempt, and waiting as value parameters (Dart passes primitives by value). The mutations inside _wait() (_sleep = Duration(...), _attempt++, waiting = null) have no effect on the caller's variables. This means:

  • The backoff strategy is broken -- _sleep never increases across iterations.
  • _attempt never increments across iterations.
  • The waiting message prints every iteration instead of once.

Resource Safety

Resource leak risk on timeout: The timeout check was moved from before lock.lock() to after _wait(). The original code checked timeout before attempting another lock. Now if the timeout fires inside _wait(), the exception is thrown, but any lock resources from the first lock.lock() at the top of the loop (which returned false) are not an issue. However, the refactoring removed the TODO comments about force-closing on timeout -- those should be preserved as they represent real future work.

_execute() cleanup ordering is correct: unlock -> close -> unlink ordering is maintained, which is good.

Structural Changes

The split of UnixLock and WindowsLock into separate files (unix_lock.dart, windows_lock.dart) is a clean refactoring with no behavioral change -- those classes are identical to the original code. This part is safe.

The lint_hard addition and type annotation fixes (String toString(), StackTrace trace) are all safe and correct.

The new test named_lock_async_test.dart is good in principle but has an issue: NamedLock.guard() returns ExecutionCall synchronously -- it does not return a Future. The await on line 29 (await NamedLock.guard(...)) will not actually wait for the async callable to complete since guard() calls execution.execute() synchronously and the callable is an async function returning FutureOr<void>. The test may pass coincidentally due to timing, but it does not actually validate async behavior.

Recommendation

HOLD -- The _wait() extraction introduces a broken backoff mechanism (pass-by-value parameters), and the restructured while loop creates a logic bug where successful lock acquisition in the else branch skips execution. These are concurrency-critical bugs that would cause deadlocks or infinite loops in production. The file split and lint additions are fine.

Suggested fixes:

  1. Keep _sleep, _attempt, and waiting state in the caller and pass results back, or keep them as fields, or revert to inline logic.
  2. Remove the second lock.lock() call in the else branch -- the while loop already handles retry.
  3. Preserve the timeout check location (before attempting lock, not after sleep).

@tsavo-at-pieces
Copy link
Contributor

Detailed Code Review -- named_locks PR #11

Changes

This PR makes several categories of changes:

  1. File extraction: UnixLock and WindowsLock classes are moved from named_lock.dart into their own files (unix_lock.dart, windows_lock.dart). The barrel file is updated to export them from their new locations.

  2. Refactoring of NamedLock.guard: The main lock-acquire loop is refactored to extract two helper methods: _wait() and _execute(). This is the most significant behavioral change.

  3. Lint suppression: A new analysis_options.yaml is added with lint_hard and ~25 lint rules suppressed. lint_hard is added as a dev dependency.

  4. Type annotation fixes: toString() return types are annotated, StackTrace is explicitly typed in catchError callbacks, @override annotations are added where missing, and receiver.first calls in the test get explicit as int casts.

  5. Formatting: Import statements and long lines are reformatted for readability.

  6. New test: named_lock_async_test.dart tests that async callbacks within guard execute correctly.

Code Quality

Critical bug -- _wait() method parameters are passed by value, not by reference:

The _wait() static method receives _sleep, _attempt, and waiting as parameters. Inside _wait(), these are mutated:

  • _sleep is reassigned with a new Duration
  • _attempt is incremented
  • waiting is set to null

However, because Dart passes these types (Duration, int, String?) by value, none of these mutations propagate back to the caller. This means:

  • The backoff strategy (_sleep growth) never takes effect -- the sleep duration stays at its initial 2ms on every iteration.
  • The attempt counter (_attempt) never increments -- it stays at 1 forever.
  • The waiting message will print on every iteration instead of only the first.

This is a regression from the original inline code where these variables were mutated in place within the while loop. The refactoring broke the backoff logic entirely.

Second critical issue -- lock.lock() is called twice per failed iteration:

In the refactored guard() method, when locked is false:

} else {
  _wait(verbose, locked, _sleep, _attempt, waiting, now, timeout);
  locked = lock.lock();  // <-- second lock attempt after _wait
}

But the loop then continues to while (!locked) and calls locked = lock.lock() again at the top of the next iteration. The original code only called lock.lock() once per loop iteration. Wait -- actually looking more carefully, the refactored loop has locked = lock.lock() only inside the if/else, not at the top. So the first lock.lock() call at line locked = lock.lock(); (before the if/else) sets locked, and if false, _wait is called, and then locked = lock.lock() is called again inside the else branch. This means on a failed acquire, it will call lock.lock() twice in rapid succession (once in _wait with no actual backoff due to the by-value bug, and once immediately after). This double-lock attempt per iteration is unintentional.

_execute() method signature is excessively verbose:

The _execute() method has a fully expanded generic type signature that spans ~30 lines. This could be simplified to NativeSemaphore lock with the appropriate type alias, or even just dynamic, since all that is called on it is .unlock(), .close(), and .unlink().

analysis_options.yaml formatting issue:

The include: directive is placed after the analyzer: block (line 32), which means the analyzer error suppressions above it may not interact correctly with lint_hard's rules. The include should typically be the first directive. Also, the file is missing a trailing newline.

Tests

The new named_lock_async_test.dart is a reasonable addition. It tests that an async callable within ExecutionCall completes in the expected order. However:

  • The await NamedLock.guard<void, IntentionalTestException>(...) call on line 29 will not actually await the async callable's completion because guard() returns ExecutionCall, not a Future. The test may pass coincidentally because the 500ms delay at line 34 gives enough time, but this does not validate that async callbacks are properly awaited by the guard mechanism.
  • No tests validate the backoff behavior or timeout logic, which is where the regression was introduced.

Verdict

NEEDS_WORK

The core _wait() refactoring introduces a regression in backoff/retry logic due to Dart's pass-by-value semantics for int, Duration, and String?. The _sleep, _attempt, and waiting mutations inside _wait() have no effect on the caller's variables. This must be fixed before merge -- either by keeping the logic inline, returning a record/tuple of updated values, or wrapping the mutable state in an object.

The file extraction (UnixLock/WindowsLock to separate files) and type annotation fixes are clean and welcome. The new async test is a good idea but does not actually validate async await behavior in guard().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants