Dimos Unity Simulator by jeff-hykin · Pull Request #1539 · dimensionalOS/dimos

jeff-hykin · 2026-03-12T21:55:52Z

Problem

Testing g1 stuff is hard and the mujoco sim for the g1 is bad. (It would be nice to have automated tests using the unity sim from the ros nav stack)

Solution

Port the unity simulator as a DimOS module

Auto downloads the executable if needed
Has tests

Breaking Changes

None

How to Test

On linux x86 only:

dimos run unity-sim

Should download the unity simulator with a big message about it, then should open up the graphical sim window and a rerun window. Clicking won't navigate (its just a sim not the full nav stack)

Contributor License Agreement

I have read and approved the CLA.

Replace gdown/Google Drive auto-download with get_data() LFS asset (cmu_unity_sim_x86, 128MB compressed). Simplify config by removing unity_scene, unity_cache_dir, auto_download fields. Clean up blueprint (remove __main__.py, rename to unity_sim, remove resolve_unity_binary requirement hook).

jeff-hykin · 2026-03-12T21:56:54Z

dimos/core/native_module.py

+            module=module_name,
+            cmd=" ".join(cmd),
+            cwd=cwd,
+        )


Changes to native modules here cause I'm testing the unity sim with the livox native modules and got VERY undescriptive error messages

jeff-hykin · 2026-03-12T21:58:46Z

dimos/simulation/unity/module.py

+    #       "visual_override": {"world/camera_info": UnityBridgeModule.rerun_suppress_camera_info},
+    #   }
+    @staticmethod
+    def rerun_static_pinhole(rr: Any) -> list[Any]:


whenever our rerun API gets improved this will need to be changed

jeff-hykin · 2026-03-12T22:00:04Z

dimos/utils/ros1.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""ROS1 binary message deserialization — no ROS1 installation required.


Unity sim was made for ROS1 so we needed some tooling to convert those messages

greptile-apps · 2026-03-12T22:01:00Z

Greptile Summary

This PR ports the CMU VLA Challenge Unity simulator as a first-class DimOS module (UnityBridgeModule). It implements the ROS-TCP-Endpoint binary protocol from scratch in pure Python (no ROS installation required), auto-downloads the Linux x86_64 binary via LFS on first use, integrates kinematic odometry at 200 Hz, and wires everything into a Rerun viewer via a one-liner dimos run unity-sim blueprint. A new dimos/utils/ros1.py utility adds reusable ROS1 binary serialization helpers.

Key issues found:

Broken crash stderr in native_module.py — The new last_stderr collection reads from a pipe that has already been fully consumed and closed by _read_log_stream. The stderr_t.join() waits for the reader thread to finish draining the stream, so the subsequent .read() always returns empty bytes. Crash logs will always show last_stderr=None, making the new diagnostic field a no-op.
Server socket leak in _unity_loop — The TCP server socket is not guarded by a try/finally, so if bind() fails (e.g. port 10000 already in use) the file descriptor leaks and the port stays occupied until the process exits.
Stale messages on Unity reconnect — The unbounded _send_queue is shared across reconnections. If Unity crashes and reconnects, the queued poses and system commands from the prior session are delivered to the new connection before its handshake completes.
Minor race condition in _on_terrain — self._x and self._y are read without _state_lock in the terrain callback while _sim_loop writes them in a separate thread.

Confidence Score: 2/5

Needs fixes before merge — one new diagnostic feature is silently broken and the socket/queue issues can cause problems under reconnect scenarios.
The core Unity bridge functionality is well-structured and the test coverage is solid. However, three logic bugs were found: the crash-report last_stderr in native_module.py will always be empty (the stderr pipe is already closed when the read is attempted), the TCP server socket in _unity_loop lacks a try/finally and will leak on bind failure, and stale queue messages will be replayed to a reconnected Unity session. These bugs are not catastrophic for the happy path but do undermine reliability and the intended diagnostic value of the changes.
dimos/core/native_module.py (broken stderr collection) and dimos/simulation/unity/module.py (socket leak + stale queue) need the most attention before merge.

Important Files Changed

Filename	Overview
dimos/core/native_module.py	Enriches crash-report logging with module name, exe name, and last stderr — but the `last_stderr` collection is broken: the stderr reader thread fully consumes and closes the pipe before the read attempt, so crash logs will always show `last_stderr=None`.
dimos/simulation/unity/module.py	Core Unity TCP bridge module. Has three issues: server socket not wrapped in try/finally (fd leak on bind failure), send queue not drained between reconnections (stale messages to new session), and position state read without lock in terrain callback.
dimos/utils/ros1.py	Clean pure-Python ROS1 binary serialization/deserialization utilities. Handles PointCloud2, CompressedImage, and PoseStamped correctly with both fast (standard XYZI layout) and slow (arbitrary offsets) paths.
dimos/simulation/unity/test_unity_sim.py	Comprehensive test suite covering config, platform validation, pickle round-trip, TCP bridge, kinematic sim, and live Unity integration. Minor fragility from `time.sleep`-based synchronization in TCP and kinematic tests.
dimos/simulation/unity/blueprint.py	Standalone blueprint wiring UnityBridgeModule to a Rerun viewer with correct static pinhole and camera info suppression. Clean and straightforward.
dimos/robot/all_blueprints.py	Registers the new `unity-sim` blueprint entry point. Single-line addition, alphabetically ordered correctly.
data/.lfs/cmu_unity_sim_x86.tar.gz	Git LFS pointer for the 127 MB Unity simulator binary. Correctly formatted LFS stub; binary is fetched on demand via `get_data()`.

Sequence Diagram

sequenceDiagram
    participant User
    participant UnityBridgeModule
    participant TCPServer as TCP Server (_unity_loop)
    participant SimLoop as Kinematic Sim (_sim_loop)
    participant UnityBinary as Unity Binary Process
    participant Rerun

    User->>UnityBridgeModule: start()
    UnityBridgeModule->>SimLoop: spawn thread
    UnityBridgeModule->>TCPServer: spawn thread
    UnityBridgeModule->>UnityBinary: Popen (auto-download via get_data if needed)
    TCPServer->>TCPServer: bind(:10000) + listen

    UnityBinary-->>TCPServer: TCP connect
    TCPServer->>UnityBinary: __handshake {version, protocol}
    UnityBinary->>TCPServer: __topic_list request
    TCPServer->>UnityBinary: __topic_list response [/unity_sim/set_model_state, /tf]

    loop Sensor data stream
        UnityBinary->>TCPServer: /registered_scan (ROS1 PointCloud2)
        TCPServer->>UnityBridgeModule: registered_scan.publish()
        UnityBridgeModule->>Rerun: lidar point cloud

        UnityBinary->>TCPServer: /color/image_raw/compressed (ROS1 CompressedImage)
        TCPServer->>UnityBridgeModule: color_image.publish()
        UnityBridgeModule->>Rerun: decoded RGB image
    end

    loop Kinematic sim at 200 Hz
        SimLoop->>SimLoop: integrate cmd_vel → (x, y, z, yaw)
        SimLoop->>UnityBridgeModule: odometry.publish()
        SimLoop->>UnityBridgeModule: tf.publish()
        SimLoop->>TCPServer: enqueue /unity_sim/set_model_state (PoseStamped)
        TCPServer->>UnityBinary: send pose update
    end

    User->>UnityBridgeModule: stop()
    UnityBridgeModule->>UnityBinary: SIGTERM
    UnityBridgeModule->>SimLoop: _running = False (join)
    UnityBridgeModule->>TCPServer: _running = False (join)

_{Last reviewed commit: bcc33ef}

greptile-apps · 2026-03-12T22:01:03Z

dimos/core/native_module.py

+        # Collect any remaining stderr for the crash report
+        last_stderr = ""
+        if self._process.stderr and not self._process.stderr.closed:
+            try:
+                remaining = self._process.stderr.read()
+                if remaining:
+                    last_stderr = remaining.decode("utf-8", errors="replace").strip()
+            except Exception:
+                pass


last_stderr will always be empty — crash report has no stderr

The stderr is already fully consumed and closed by the time this read is attempted. _read_log_stream (called via _start_reader) iterates over the entire stream and explicitly calls stream.close() at the end. After stderr_t.join(timeout=2) completes, the stream is exhausted and closed, so self._process.stderr.read() will always return b"".

As a result, last_stderr will always be None in the crash log, making the new crash-report feature a no-op.

A common fix for this pattern is to buffer the last N lines inside the reader thread itself and expose them via a shared variable, rather than trying to re-read the already-closed pipe.

# Example fix: buffer the last few lines in _read_log_stream # and expose via an instance variable, e.g. self._last_stderr_lines

greptile-apps · 2026-03-12T22:01:05Z

dimos/simulation/unity/module.py

+    def _unity_loop(self) -> None:
+        server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+        server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+        server_sock.bind((self.config.unity_host, self.config.unity_port))
+        server_sock.listen(1)
+        server_sock.settimeout(2.0)
+        logger.info(f"TCP server on :{self.config.unity_port}")
+
+        while self._running:
+            try:
+                conn, addr = server_sock.accept()
+                logger.info(f"Unity connected from {addr}")
+                try:
+                    self._bridge_connection(conn)
+                except Exception as e:
+                    logger.info(f"Unity connection ended: {e}")
+                finally:
+                    with self._state_lock:
+                        self._unity_connected = False
+                    conn.close()
+            except TimeoutError:
+                continue
+            except Exception as e:
+                if self._running:
+                    logger.warning(f"TCP server error: {e}")
+                    time.sleep(1.0)
+
+        server_sock.close()


Server socket leaked if bind() or listen() raises

The server socket is created and set up before the while self._running: loop, but server_sock.close() only appears at the very bottom (line 465). If bind() raises (e.g., OSError: [Errno 98] Address already in use on port 10000) or any other exception is thrown during setup, the socket's file descriptor is leaked and the port may remain occupied.

Wrap the socket in a try/finally:

def _unity_loop(self) -> None: server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server_sock.bind((self.config.unity_host, self.config.unity_port)) server_sock.listen(1) server_sock.settimeout(2.0) logger.info(f"TCP server on :{self.config.unity_port}") try: while self._running: ... finally: server_sock.close()

greptile-apps · 2026-03-12T22:01:05Z

dimos/simulation/unity/module.py

+        finally:
+            halt.set()
+            sender.join(timeout=2.0)
+            with self._state_lock:
+                self._unity_connected = False


Stale send-queue messages delivered to a reconnected Unity session

The _send_queue is a shared, module-level Queue that is never cleared when a connection ends. _unity_loop accepts new connections in a loop, so if Unity disconnects and reconnects (e.g., after a crash), any messages that were queued during the old session — including stale odometry poses and topic-list responses — will be dequeued and sent to the fresh connection before it has completed its handshake.

The sender thread reads from the queue in _unity_sender, which runs per-connection (created in _bridge_connection). When a connection ends and the sender thread terminates, queued items remain in the shared queue. The next time _bridge_connection spawns a sender, those leftovers are delivered immediately.

A simple fix is to drain the queue at the start of _bridge_connection:

def _bridge_connection(self, sock: socket.socket) -> None: # Drain any stale messages from a previous session while not self._send_queue.empty(): try: self._send_queue.get_nowait() except Empty: break ...

greptile-apps · 2026-03-12T22:01:07Z

dimos/simulation/unity/module.py

+    def _on_terrain(self, cloud: PointCloud2) -> None:
+        points, _ = cloud.as_numpy()
+        if len(points) == 0:
+            return
+        dx = points[:, 0] - self._x
+        dy = points[:, 1] - self._y
+        near = points[np.sqrt(dx * dx + dy * dy) < 0.5]
+        if len(near) >= 10:
+            with self._state_lock:
+                self._terrain_z = 0.8 * self._terrain_z + 0.2 * near[:, 2].mean()


self._x / self._y read without lock in terrain callback

_on_terrain reads self._x and self._y at lines 429–430 directly (without _state_lock), while _sim_loop writes them at lines 612–613 in a separate thread. The GIL prevents true corruption for individual float reads/writes, but the pair (self._x, self._y) is not read atomically — _on_terrain could observe an _x from one tick and a _y from the next, producing a slightly off distance calculation for the terrain Z filter.

Consider snapping both values under _state_lock, or expanding _state_lock to also cover the position state as it already does for _terrain_z and _unity_connected:

def _on_terrain(self, cloud: PointCloud2) -> None: points, _ = cloud.as_numpy() if len(points) == 0: return with self._state_lock: cur_x, cur_y = self._x, self._y dx = points[:, 0] - cur_x dy = points[:, 1] - cur_y ...

jeff-hykin added 6 commits March 12, 2026 11:28

better native module debugging

d4040f2

add unity sim, part 1

b7113f9

clean up

636ae27

cleaning

3f8ee25

improve binary downloading (google drive)

5022b2b

jeff-hykin commented Mar 12, 2026

View reviewed changes

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

jeff-hykin changed the title ~~Jeff/feat/untiy sim~~ Dimos Unity Simulator Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dimos Unity Simulator#1539

Dimos Unity Simulator#1539
jeff-hykin wants to merge 6 commits intodevfrom
jeff/feat/untiy_sim

jeff-hykin commented Mar 12, 2026

Uh oh!

jeff-hykin Mar 12, 2026 •

edited

Loading

Uh oh!

jeff-hykin Mar 12, 2026

Uh oh!

jeff-hykin Mar 12, 2026

Uh oh!

greptile-apps bot commented Mar 12, 2026

Uh oh!

greptile-apps bot Mar 12, 2026

Uh oh!

greptile-apps bot Mar 12, 2026

Uh oh!

greptile-apps bot Mar 12, 2026

Uh oh!

greptile-apps bot Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeff-hykin commented Mar 12, 2026

Problem

Solution

Breaking Changes

How to Test

Contributor License Agreement

Uh oh!

jeff-hykin Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeff-hykin Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jeff-hykin Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Mar 12, 2026

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jeff-hykin Mar 12, 2026 •

edited

Loading