Skip to content

Feature: Detection3D and Object support to Manipulation Module #1217

Merged
mustafab0 merged 11 commits intofeature/mustafa-add-gripper-control-for-control-coordinatorfrom
feature/mustafa-detection3d-pcd-manipulation
Feb 11, 2026
Merged

Feature: Detection3D and Object support to Manipulation Module #1217
mustafab0 merged 11 commits intofeature/mustafa-add-gripper-control-for-control-coordinatorfrom
feature/mustafa-detection3d-pcd-manipulation

Conversation

@mustafab0
Copy link
Contributor

Problem

Detected 3D objects from perception cannot be used as collision obstacles for motion planning.
ManipulationModule lacks TF publishing needed for eye-in-hand camera transform chains.

Solution

ObjectSceneRegistration publishes deduplicated objects via new objects output port with pointclouds.
ManipulationModule subscribes async via observable() and caches objects in WorldObstacleMonitor.

Dedicated 10Hz TF thread publishes EE and tf_extra_links poses using scratch_context().
Eager self.tf init prevents autoconf() blocking in Dask workers.

WorldObstacleMonitor syncs BOX obstacles to Drake on refresh() with opt-in convex hull meshes.
pointcloud_to_convex_hull_obj centers points at origin to avoid double-transform with pose.

ManipulationClient adds refresh(), detections(), goto_object(), and perception status commands.
xarm_perception blueprint wires xArm7 + eye-in-hand RealSense + ObjectSceneRegistration + Foxglove.

Breaking Changes
None.

How to Test

pytest dimos/manipulation/test_manipulation_unit.py -v
dimos run xarm_perception
# In client: refresh(5), goto_object("cup", dz=0.1), preview(), execute()

closes DIM-353
closes DIM-397
closes DIM-396

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 8, 2026

Greptile Overview

Greptile Summary

Adds Detection3D and Object support to the manipulation module for perception-driven motion planning. Key changes:

  • Perception Integration: ObjectSceneRegistration publishes deduplicated objects with pointclouds via new objects output port. ManipulationModule subscribes asynchronously and caches them in WorldObstacleMonitor.

  • TF Publishing: Dedicated 10Hz thread publishes EE and tf_extra_links poses using thread-safe scratch_context(). Eager self.tf initialization prevents autoconf blocking in Dask workers.

  • Obstacle Management: WorldObstacleMonitor syncs cached objects to Drake on refresh_obstacles() with opt-in convex hull meshes from pointclouds. The pointcloud_to_convex_hull_obj function centers points at origin to avoid double-transform with obstacle pose.

  • Client API: ManipulationClient adds refresh(), detections(), goto_object(), and perception status commands for interactive manipulation workflows.

  • Blueprint: New xarm_perception blueprint wires xArm7 with eye-in-hand RealSense camera, object scene registration, and Foxglove visualization.

  • Breaking Change: execute() now uses task_invoke() instead of execute_trajectory() for coordinator RPC (tests updated accordingly).

Confidence Score: 4/5

  • This PR is safe to merge with minor suggestions for improvement
  • The implementation is well-structured with proper thread safety (scratch_context usage), graceful error handling, and comprehensive test updates. The TF publishing thread uses daemon=True and proper shutdown. Convex hull generation has appropriate fallback to bounding boxes. One minor style suggestion about cache filename generation using content hashes instead of memory addresses.
  • No files require special attention - the cache filename generation in mesh_utils.py could be improved for better cache reuse, but this is a performance optimization rather than a critical issue

Important Files Changed

Filename Overview
dimos/manipulation/manipulation_module.py Added Detection3D object subscription, TF publishing thread for EE and extra links, and perception obstacle RPC methods. Changed execute() to use task_invoke instead of execute_trajectory.
dimos/perception/object_scene_registration.py Added objects output port that publishes deduplicated Object instances alongside existing detections_3d output.
dimos/manipulation/planning/monitor/world_obstacle_monitor.py Added object-based perception caching with stable object_ids, refresh_obstacles() for full sync, and optional convex hull mesh generation from pointclouds.
dimos/manipulation/planning/utils/mesh_utils.py Added pointcloud_to_convex_hull_obj() function that centers points at origin before computing convex hull to avoid double-transform with obstacle pose.
dimos/manipulation/planning/monitor/world_monitor.py Added on_objects(), perception status methods, and get_link_pose() for arbitrary link FK. All methods use scratch_context() for thread safety.

Sequence Diagram

sequenceDiagram
    participant OSR as ObjectSceneRegistration
    participant MM as ManipulationModule
    participant WM as WorldMonitor
    participant WOM as WorldObstacleMonitor
    participant DW as DrakeWorld
    participant TF as TF Publisher Thread
    participant Client as ManipulationClient
    
    Note over OSR,MM: Perception Integration Flow
    OSR->>OSR: Deduplicate detections via ObjectDB
    OSR->>MM: objects.publish(Object[])
    MM->>WM: on_objects(objects)
    WM->>WOM: on_objects(objects)
    WOM->>WOM: Cache objects with stable object_id
    
    Note over Client,DW: User Workflow
    Client->>MM: refresh_obstacles(min_duration)
    MM->>WM: refresh_obstacles(min_duration)
    WM->>WOM: refresh_obstacles(min_duration)
    WOM->>WOM: Filter cached objects by duration
    WOM->>WOM: pointcloud_to_convex_hull_obj()
    WOM->>DW: add_obstacle(Obstacle{MESH/BOX})
    DW->>DW: Create Convex or Box shape
    WOM-->>Client: List of added obstacles
    
    Note over MM,TF: TF Publishing (10Hz Thread)
    loop Every 0.1s
        TF->>WM: get_ee_pose(robot_id)
        WM->>WM: scratch_context()
        WM->>DW: get_ee_pose(ctx, robot_id)
        DW-->>TF: PoseStamped
        TF->>WM: get_link_pose(robot_id, link)
        WM->>WM: scratch_context()
        WM->>DW: get_link_pose(ctx, robot_id, link)
        DW-->>TF: PoseStamped
        TF->>TF: tf.publish(Transform[])
    end
    
    Note over Client,DW: Motion Planning with Obstacles
    Client->>MM: goto_object("cup", dz=0.1)
    MM->>MM: Match object from cache
    MM->>MM: plan_pose(x, y, z+offset)
    MM->>DW: check_collision_free()
    DW->>DW: Check against obstacles (MESH/BOX)
    MM-->>Client: Plan result
    Client->>MM: execute()
    MM->>MM: task_invoke("traj_arm", "execute", trajectory)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@mustafab0 mustafab0 force-pushed the feature/mustafa-detection3d-pcd-manipulation branch from 82e6db4 to d071ef6 Compare February 9, 2026 23:10
@mustafab0 mustafab0 force-pushed the feature/mustafa-add-gripper-control-for-control-coordinator branch from f51144b to 6c77e02 Compare February 9, 2026 23:11
@mustafab0 mustafab0 force-pushed the feature/mustafa-detection3d-pcd-manipulation branch from d071ef6 to 9431928 Compare February 10, 2026 05:15
…ator' into feature/mustafa-detection3d-pcd-manipulation
…izer (#1227)

* with seperate preview urdf

* running meshcat on its dedicated thread allows for real time preview update

* added meshcat viz executor shutdown

* removed sleep on meshcat thread now time.sleep is only called in rpc thread

* preview urdf is now persistent and does not disappear after preview

* wrapped meshcat threadexecutor in its class


def pointcloud_to_convex_hull_obj(
points: NDArray[np.float64],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you keeping points as numpy arrays? should use PointCloud2 which is stored in o3d, which has natural very fast convex hull calculations so this function would be one line or can add .to_convex_hull on PointCloud2 class?

Copy link
Contributor

@leshy leshy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your call what to adress

# These must be imported at runtime (not TYPE_CHECKING) for In/Out port creation
from dimos.msgs.sensor_msgs import JointState
from dimos.msgs.trajectory_msgs import JointTrajectory
from dimos.perception.detection.type.detection3d.object import Object as DetObject # noqa: TC001
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a huge fan of this Object type, as this seems to be a current top level interface for the world this is a potential issue, but not reviewing it in detail, feel free to own/rewrite


def get_link_pose(
self, ctx: Any, robot_id: WorldRobotID, link_name: str
) -> NDArray[np.float64]:
Copy link
Contributor

@leshy leshy Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you returning np arrays and not poses/transforms? as long as your external interface is dimos types I don't care, but u will suffer

coordinator_task_name: str | None = None
gripper_hardware_id: str | None = None
# TF publishing for extra links (e.g., camera mount)
tf_extra_links: list[str] = field(default_factory=list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be a list of actual transforms? from a particular existing arm transform to a new one?
(like Joint3 -> Joint_Camera)

I can have multiple cameras etc, why strings, they don't encode actual spatial relationships

@mustafab0 mustafab0 merged commit 285351e into feature/mustafa-add-gripper-control-for-control-coordinator Feb 11, 2026
16 checks passed
mustafab0 added a commit that referenced this pull request Feb 11, 2026
@mustafab0 mustafab0 deleted the feature/mustafa-detection3d-pcd-manipulation branch February 18, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants