Skip to content

Bugfix: Trajectory preview randomly times out in drake meshcat visualizer#1227

Merged
mustafab0 merged 6 commits intofeature/mustafa-detection3d-pcd-manipulationfrom
bugfix/mustafa-preview-function-timeout-bugfix
Feb 10, 2026
Merged

Bugfix: Trajectory preview randomly times out in drake meshcat visualizer#1227
mustafab0 merged 6 commits intofeature/mustafa-detection3d-pcd-manipulationfrom
bugfix/mustafa-preview-function-timeout-bugfix

Conversation

@mustafab0
Copy link
Contributor

@mustafab0 mustafab0 commented Feb 10, 2026

Problem

preview_path() RPC randomly times out. Drake's Meshcat throws SystemExit (not Exception) when called from any thread other than its creator. RPC handlers run on a 50-worker thread pool, so many Meshcat call dies silently — no response sent, client sees timeout.

Issue: #


Solution

Route all Meshcat calls through a ThreadPoolExecutor(max_workers=1) that creates and owns the Meshcat instance. _on_viz_thread(fn, *args) submits work to this executor, with re-entrancy detection to avoid deadlocks when animate_path's nested calls are already on the viz thread.

Also reduced interpolation resolution from 0.02 to 0.1 for faster preview animations.


Breaking Changes

None


How to Test

  1. Run a blueprint xarm-perception, coordinator-mock and manipulation_client.py
  2. Open Meshcat in browser
  3. Call plan_to_joints() then preview_path() via RPC
  4. Yellow preview robot animates the path, live robot stays in place, no timeout

closes DIM-443

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 10, 2026

Greptile Overview

Greptile Summary

This PR addresses intermittent preview_path() RPC timeouts by enforcing Drake Meshcat’s thread affinity: Meshcat is created on a dedicated single-worker and all Meshcat / MeshcatVisualizer calls are routed through _on_viz_thread(...), avoiding Drake’s SystemExit when invoked from non-creator threads.

In DrakeWorld, it also adds a separate “preview” (yellow/transparent) robot instance used for animation so the live robot state doesn’t move during previews. preview_path() now interpolates more coarsely (0.1 vs 0.02) to speed up animation.

Key integration point: ManipulationModule.preview_path() calls WorldMonitor.world.animate_path(), which now runs the full visualization publish loop on the Meshcat creator thread via the executor.

Confidence Score: 3/5

  • This PR is close to mergeable but needs a Meshcat executor shutdown to avoid leaking threads in long-running/restarted processes.
  • Main logic change (routing Meshcat calls to a dedicated creator thread) is coherent and consistently applied, but introducing a persistent ThreadPoolExecutor without a cleanup path is a concrete lifecycle bug that can accumulate threads or block clean shutdown when worlds/modules are recreated.
  • dimos/manipulation/planning/world/drake_world.py

Important Files Changed

Filename Overview
dimos/manipulation/manipulation_module.py Reduces preview path interpolation resolution from 0.02 to 0.1 before calling world.animate_path(), speeding up visualization.
dimos/manipulation/planning/world/drake_world.py Adds Meshcat thread-affinity routing via single-thread executor, introduces preview (ghost) robot instances, and updates obstacle/visualization calls to run on viz thread; must add executor shutdown to avoid thread leaks.

Sequence Diagram

sequenceDiagram
    participant Client as RPC Client
    participant RPC as ManipulationModule.preview_path()
    participant WM as WorldMonitor
    participant DW as DrakeWorld
    participant VEX as Viz ThreadPoolExecutor(1)
    participant Meshcat as Meshcat/MeshcatVisualizer

    Client->>RPC: preview_path(duration)
    RPC->>RPC: interpolate_path(resolution=0.1)
    RPC->>WM: world.animate_path(robot_id, path, duration)
    WM->>DW: animate_path(robot_id, path, duration)
    DW->>VEX: _on_viz_thread(_do_animate)
    VEX->>DW: _do_animate() (runs on creator thread)
    DW->>DW: show_preview(robot_id)
    DW->>VEX: _on_viz_thread(meshcat.SetProperty visible=true)
    loop each waypoint
        DW->>DW: _set_preview_positions(_plant_context,...)
        DW->>VEX: _on_viz_thread(meshcat_visualizer.ForcedPublish(viz_ctx))
        VEX->>Meshcat: ForcedPublish updates
    end
    DW->>DW: hide_preview(robot_id)
    DW->>VEX: _on_viz_thread(meshcat.SetProperty visible=false)
    DW->>VEX: _on_viz_thread(meshcat_visualizer.ForcedPublish(viz_ctx))
    VEX->>Meshcat: Final publish
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@mustafab0 mustafab0 requested a review from leshy February 10, 2026 05:35
@mustafab0 mustafab0 changed the title Bugfix/mustafa preview function timeout bugfix Bugfix: Trajectory preview randomly times out in drake meshcat visualizer Feb 10, 2026
Comment on lines 140 to 143
self._viz_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="meshcat")
self._viz_thread = self._viz_executor.submit(current_thread).result()
self._meshcat = self._viz_executor.submit(Meshcat).result()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks good.

The only suggestion I'd have is to wrap this.

DrakeWorld doesn't care about the executor or the thread, it just needs to call meshcat, so it could be simpler from DrakeWorld's perspective if you'd have a SingleThreadedMeshcat (which contains all of this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped the meshcat in a ThreadSafeMeshcat class.

Implementation is much more cleaner. Thanks !

assert self._plant_context is not None
self._set_preview_positions(self._plant_context, robot_id, positions)
self.publish_visualization()
time.sleep(dt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sleeping on the meshcat thread blocks other calls. You should sleep on the calling thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed sleep from meshcat thread. sleep now lives in the calling thread


# Interpolate and animate
interpolated = interpolate_path(planned_path, resolution=0.02)
interpolated = interpolate_path(planned_path, resolution=0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the resolution change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interpolation happens only for animation. As I was having too many timeouts, one theory was that the resolution was too fine. It doesn't seem to be the case. I can test to see if 0.2 is good enough.

But 0.1 seems to work great for preview, and less computational overhead.

@mustafab0 mustafab0 force-pushed the bugfix/mustafa-preview-function-timeout-bugfix branch from 0629fc8 to 8a48937 Compare February 10, 2026 20:01
@mustafab0 mustafab0 merged commit 33e4aa8 into feature/mustafa-detection3d-pcd-manipulation Feb 10, 2026
15 checks passed
mustafab0 added a commit that referenced this pull request Feb 11, 2026
* Feature: Add gripper control for control coordinator (#1213)

* fix xarm adapter gripper method

* exposed adapter as a property to control coordinator. This enables cusotm method implementation

* rpc calls for gripper control added to control coordinator

* Gripper RPC methods added to manipuilation module

* updated manipulation client

* added tf support to manipulation module

* TF support on manipulation module and Object Input topic support

* object scene registration publishes objects with pointclouds

* updated manipulation client to implement obstacle specific methods

* blueprint added for xarm7 and realsense robot

* pointcloud to conves hull obj for drake imports

* mypy error fix

* fix mypy errors

* Bugfix: Trajectory preview randomly times out in drake meshcat visualizer (#1227)

* with seperate preview urdf

* running meshcat on its dedicated thread allows for real time preview update

* added meshcat viz executor shutdown

* removed sleep on meshcat thread now time.sleep is only called in rpc thread

* preview urdf is now persistent and does not disappear after preview

* wrapped meshcat threadexecutor in its class
mustafab0 added a commit that referenced this pull request Feb 11, 2026
…izer (#1227)

* with seperate preview urdf

* running meshcat on its dedicated thread allows for real time preview update

* added meshcat viz executor shutdown

* removed sleep on meshcat thread now time.sleep is only called in rpc thread

* preview urdf is now persistent and does not disappear after preview

* wrapped meshcat threadexecutor in its class
mustafab0 added a commit that referenced this pull request Feb 11, 2026
* fix xarm adapter gripper method

* exposed adapter as a property to control coordinator. This enables cusotm method implementation

* rpc calls for gripper control added to control coordinator

* Gripper RPC methods added to manipuilation module

* updated manipulation client

* added tf support to manipulation module

* TF support on manipulation module and Object Input topic support

* object scene registration publishes objects with pointclouds

* updated manipulation client to implement obstacle specific methods

* blueprint added for xarm7 and realsense robot

* pointcloud to conves hull obj for drake imports

* mypy error fix

* fix mypy errors

* Bugfix: Trajectory preview randomly times out in drake meshcat visualizer (#1227)

* with seperate preview urdf

* running meshcat on its dedicated thread allows for real time preview update

* added meshcat viz executor shutdown

* removed sleep on meshcat thread now time.sleep is only called in rpc thread

* preview urdf is now persistent and does not disappear after preview

* wrapped meshcat threadexecutor in its class
@mustafab0 mustafab0 deleted the bugfix/mustafa-preview-function-timeout-bugfix branch February 18, 2026 20:19
@mustafab0 mustafab0 linked an issue Feb 18, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

preview() function hangs when previewing planned trajecoty in Manipulation module drake sim Planner Visualization in Meshcat - ideally rerun

2 participants