fix: surface cross-loop AsyncRobotClient misuse with a clear error#4
Closed
AlvarEhr wants to merge 1 commit into
Closed
fix: surface cross-loop AsyncRobotClient misuse with a clear error#4AlvarEhr wants to merge 1 commit into
AlvarEhr wants to merge 1 commit into
Conversation
AsyncRobotClient is implicitly single-loop-bound: its persistent UDP DatagramTransport, RX asyncio.Queue, request/endpoint asyncio.Locks, and shared status asyncio.Event are all bound at first-use to the event loop running at the time. Reusing the same instance from a different loop raises a cryptic ``RuntimeError: <Queue ...> is bound to a different event loop`` from deep inside ``_request_ok_raw`` / ``_request``, with a traceback that points at the asyncio queue's internal ``_get_loop()`` and gives no hint as to which loops are mismatched. The trap is easy to fall into when wrapping AsyncRobotClient inside a sync RobotClient (which drives its own private thread-loop) and then ALSO calling AsyncRobotClient methods directly from a different loop — for example, ``loop.create_task(inner.halt())`` from a NiceGUI page handler. The sync wrapper has bound the client to its thread-loop; the page handler's create_task schedules the coroutine on the main loop instead, so the next ``queue.get()`` raises. The UDP HALT packet still goes through (``sendto`` is synchronous), so the controller halts — but the Python side surfaces the confusing error. Track the bound loop on first endpoint creation and check it on every subsequent ``_ensure_endpoint`` call. Mismatches now raise an actionable error that names both loops and points the caller at ``asyncio.run_coroutine_threadsafe(coro, bound_loop)`` as the correct cross-loop dispatch primitive. No behavior change for correct single-loop callers: the new check short-circuits on the same-loop fast path before any lock or queue operation, so the common case is one extra identity comparison.
Owner
|
Hey @AlvarEhr - nice catch on the cryptic traceback. Wanted to flag a related angle: the main path into this bug is |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
AsyncRobotClientis implicitly single-loop-bound: its persistent UDPDatagramTransport, RXasyncio.Queue, request/endpointasyncio.Locks, and shared statusasyncio.Eventare all bound at first-use to the event loop running at that moment. Reusing the same instance from a different loop later raises:The traceback points deep inside
_request_ok_raw(parol6/client/async_client.py:647) atasyncio.queues.Queue.get→mixins._get_loop, giving no hint to the caller about:AsyncRobotClientis single-loop-bound at allReproduction (concrete trap)
A common pattern that hits this: construct a sync
RobotClientfrom one thread (which drives its own private event loop on a worker thread to serve the innerAsyncRobotClient), then later schedule a coroutine method on the inner client directly from a different loop:The UDP HALT packet actually goes through (
sendtois synchronous and unaffected by loop binding), so the controller halts correctly. But the Python side raises a confusing traceback and asyncio fires a "Task exception was never retrieved" warning that survives across the session.Fix
Track the bound loop on first endpoint creation and check it on every subsequent
_ensure_endpointcall. Mismatches raise an actionable error that names both loops and points atasyncio.run_coroutine_threadsafe()as the correct dispatch primitive.Same-loop fast path adds one identity comparison; no behaviour change for correct callers.
Alternative considered
A fuller fix would rebind the queue/transport on detected loop mismatch — but the UDP
DatagramTransportis also loop-bound and recreating it would require teardown of the old transport (likely on a now-defunct loop) and reconstruction on the new one, plus the_status_transportand multicast listener would need parallel treatment. That's a substantial refactor and the cross-loop usage pattern is already a usage error; surfacing it clearly is enough for most cases.