Skip to content

fix: surface cross-loop AsyncRobotClient misuse with a clear error#4

Closed
AlvarEhr wants to merge 1 commit into
Jepson2k:mainfrom
AlvarEhr:fix/async-client-loop-binding-detection
Closed

fix: surface cross-loop AsyncRobotClient misuse with a clear error#4
AlvarEhr wants to merge 1 commit into
Jepson2k:mainfrom
AlvarEhr:fix/async-client-loop-binding-detection

Conversation

@AlvarEhr
Copy link
Copy Markdown

Problem

AsyncRobotClient is implicitly single-loop-bound: its persistent UDP DatagramTransport, RX asyncio.Queue, request/endpoint asyncio.Locks, and shared status asyncio.Event are all bound at first-use to the event loop running at that moment. Reusing the same instance from a different loop later raises:

RuntimeError: <Queue at 0x... maxsize=256 tasks=2> is bound to a different event loop

The traceback points deep inside _request_ok_raw (parol6/client/async_client.py:647) at asyncio.queues.Queue.getmixins._get_loop, giving no hint to the caller about:

  • Which two loops are mismatched
  • That AsyncRobotClient is single-loop-bound at all
  • What the correct cross-loop dispatch primitive is

Reproduction (concrete trap)

A common pattern that hits this: construct a sync RobotClient from one thread (which drives its own private event loop on a worker thread to serve the inner AsyncRobotClient), then later schedule a coroutine method on the inner client directly from a different loop:

sync_client = RobotClient(host="127.0.0.1", port=5001)
# ... use sync_client.move_j(...) etc. — fine, runs on its own loop ...

# Later, from a separate asyncio loop:
inner = sync_client._inner
loop = asyncio.get_running_loop()
loop.create_task(inner.halt())
# → eventually raises RuntimeError "is bound to a different event loop"

The UDP HALT packet actually goes through (sendto is synchronous and unaffected by loop binding), so the controller halts correctly. But the Python side raises a confusing traceback and asyncio fires a "Task exception was never retrieved" warning that survives across the session.

Fix

Track the bound loop on first endpoint creation and check it on every subsequent _ensure_endpoint call. Mismatches raise an actionable error that names both loops and points at asyncio.run_coroutine_threadsafe() as the correct dispatch primitive.

Same-loop fast path adds one identity comparison; no behaviour change for correct callers.

Alternative considered

A fuller fix would rebind the queue/transport on detected loop mismatch — but the UDP DatagramTransport is also loop-bound and recreating it would require teardown of the old transport (likely on a now-defunct loop) and reconstruction on the new one, plus the _status_transport and multicast listener would need parallel treatment. That's a substantial refactor and the cross-loop usage pattern is already a usage error; surfacing it clearly is enough for most cases.

AsyncRobotClient is implicitly single-loop-bound: its persistent UDP
DatagramTransport, RX asyncio.Queue, request/endpoint asyncio.Locks,
and shared status asyncio.Event are all bound at first-use to the
event loop running at the time. Reusing the same instance from a
different loop raises a cryptic ``RuntimeError: <Queue ...> is bound
to a different event loop`` from deep inside ``_request_ok_raw`` /
``_request``, with a traceback that points at the asyncio queue's
internal ``_get_loop()`` and gives no hint as to which loops are
mismatched.

The trap is easy to fall into when wrapping AsyncRobotClient inside
a sync RobotClient (which drives its own private thread-loop) and
then ALSO calling AsyncRobotClient methods directly from a different
loop — for example, ``loop.create_task(inner.halt())`` from a
NiceGUI page handler. The sync wrapper has bound the client to its
thread-loop; the page handler's create_task schedules the coroutine
on the main loop instead, so the next ``queue.get()`` raises. The
UDP HALT packet still goes through (``sendto`` is synchronous), so
the controller halts — but the Python side surfaces the confusing
error.

Track the bound loop on first endpoint creation and check it on
every subsequent ``_ensure_endpoint`` call. Mismatches now raise
an actionable error that names both loops and points the caller at
``asyncio.run_coroutine_threadsafe(coro, bound_loop)`` as the
correct cross-loop dispatch primitive.

No behavior change for correct single-loop callers: the new check
short-circuits on the same-loop fast path before any lock or queue
operation, so the common case is one extra identity comparison.
@Jepson2k
Copy link
Copy Markdown
Owner

Hey @AlvarEhr - nice catch on the cryptic traceback. Wanted to flag a related angle: the main path into this bug is sync_client.async_client.foo() from another loop, and that path only exists because of the async_client property on RobotClient (re-added in 8cc47dc, zero production callers). I've drafted #6 which deletes it and should remove the bug altogether. Closing for now but feel free to reopen if there is another angle I'm missing.

@Jepson2k Jepson2k closed this May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants