Conversation
Replace gdown/Google Drive auto-download with get_data() LFS asset (cmu_unity_sim_x86, 128MB compressed). Simplify config by removing unity_scene, unity_cache_dir, auto_download fields. Clean up blueprint (remove __main__.py, rename to unity_sim, remove resolve_unity_binary requirement hook).
| module=module_name, | ||
| cmd=" ".join(cmd), | ||
| cwd=cwd, | ||
| ) |
There was a problem hiding this comment.
Changes to native modules here cause I'm testing the unity sim with the livox native modules and got VERY undescriptive error messages
| # "visual_override": {"world/camera_info": UnityBridgeModule.rerun_suppress_camera_info}, | ||
| # } | ||
| @staticmethod | ||
| def rerun_static_pinhole(rr: Any) -> list[Any]: |
There was a problem hiding this comment.
whenever our rerun API gets improved this will need to be changed
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| """ROS1 binary message deserialization — no ROS1 installation required. |
There was a problem hiding this comment.
Unity sim was made for ROS1 so we needed some tooling to convert those messages
Greptile SummaryThis PR ports the CMU VLA Challenge Unity simulator as a first-class DimOS module ( Key issues found:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant UnityBridgeModule
participant TCPServer as TCP Server (_unity_loop)
participant SimLoop as Kinematic Sim (_sim_loop)
participant UnityBinary as Unity Binary Process
participant Rerun
User->>UnityBridgeModule: start()
UnityBridgeModule->>SimLoop: spawn thread
UnityBridgeModule->>TCPServer: spawn thread
UnityBridgeModule->>UnityBinary: Popen (auto-download via get_data if needed)
TCPServer->>TCPServer: bind(:10000) + listen
UnityBinary-->>TCPServer: TCP connect
TCPServer->>UnityBinary: __handshake {version, protocol}
UnityBinary->>TCPServer: __topic_list request
TCPServer->>UnityBinary: __topic_list response [/unity_sim/set_model_state, /tf]
loop Sensor data stream
UnityBinary->>TCPServer: /registered_scan (ROS1 PointCloud2)
TCPServer->>UnityBridgeModule: registered_scan.publish()
UnityBridgeModule->>Rerun: lidar point cloud
UnityBinary->>TCPServer: /color/image_raw/compressed (ROS1 CompressedImage)
TCPServer->>UnityBridgeModule: color_image.publish()
UnityBridgeModule->>Rerun: decoded RGB image
end
loop Kinematic sim at 200 Hz
SimLoop->>SimLoop: integrate cmd_vel → (x, y, z, yaw)
SimLoop->>UnityBridgeModule: odometry.publish()
SimLoop->>UnityBridgeModule: tf.publish()
SimLoop->>TCPServer: enqueue /unity_sim/set_model_state (PoseStamped)
TCPServer->>UnityBinary: send pose update
end
User->>UnityBridgeModule: stop()
UnityBridgeModule->>UnityBinary: SIGTERM
UnityBridgeModule->>SimLoop: _running = False (join)
UnityBridgeModule->>TCPServer: _running = False (join)
Last reviewed commit: bcc33ef |
| # Collect any remaining stderr for the crash report | ||
| last_stderr = "" | ||
| if self._process.stderr and not self._process.stderr.closed: | ||
| try: | ||
| remaining = self._process.stderr.read() | ||
| if remaining: | ||
| last_stderr = remaining.decode("utf-8", errors="replace").strip() | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
last_stderr will always be empty — crash report has no stderr
The stderr is already fully consumed and closed by the time this read is attempted. _read_log_stream (called via _start_reader) iterates over the entire stream and explicitly calls stream.close() at the end. After stderr_t.join(timeout=2) completes, the stream is exhausted and closed, so self._process.stderr.read() will always return b"".
As a result, last_stderr will always be None in the crash log, making the new crash-report feature a no-op.
A common fix for this pattern is to buffer the last N lines inside the reader thread itself and expose them via a shared variable, rather than trying to re-read the already-closed pipe.
# Example fix: buffer the last few lines in _read_log_stream
# and expose via an instance variable, e.g. self._last_stderr_lines| def _unity_loop(self) -> None: | ||
| server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
| server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) | ||
| server_sock.bind((self.config.unity_host, self.config.unity_port)) | ||
| server_sock.listen(1) | ||
| server_sock.settimeout(2.0) | ||
| logger.info(f"TCP server on :{self.config.unity_port}") | ||
|
|
||
| while self._running: | ||
| try: | ||
| conn, addr = server_sock.accept() | ||
| logger.info(f"Unity connected from {addr}") | ||
| try: | ||
| self._bridge_connection(conn) | ||
| except Exception as e: | ||
| logger.info(f"Unity connection ended: {e}") | ||
| finally: | ||
| with self._state_lock: | ||
| self._unity_connected = False | ||
| conn.close() | ||
| except TimeoutError: | ||
| continue | ||
| except Exception as e: | ||
| if self._running: | ||
| logger.warning(f"TCP server error: {e}") | ||
| time.sleep(1.0) | ||
|
|
||
| server_sock.close() |
There was a problem hiding this comment.
Server socket leaked if bind() or listen() raises
The server socket is created and set up before the while self._running: loop, but server_sock.close() only appears at the very bottom (line 465). If bind() raises (e.g., OSError: [Errno 98] Address already in use on port 10000) or any other exception is thrown during setup, the socket's file descriptor is leaked and the port may remain occupied.
Wrap the socket in a try/finally:
def _unity_loop(self) -> None:
server_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_sock.bind((self.config.unity_host, self.config.unity_port))
server_sock.listen(1)
server_sock.settimeout(2.0)
logger.info(f"TCP server on :{self.config.unity_port}")
try:
while self._running:
...
finally:
server_sock.close()| finally: | ||
| halt.set() | ||
| sender.join(timeout=2.0) | ||
| with self._state_lock: | ||
| self._unity_connected = False |
There was a problem hiding this comment.
Stale send-queue messages delivered to a reconnected Unity session
The _send_queue is a shared, module-level Queue that is never cleared when a connection ends. _unity_loop accepts new connections in a loop, so if Unity disconnects and reconnects (e.g., after a crash), any messages that were queued during the old session — including stale odometry poses and topic-list responses — will be dequeued and sent to the fresh connection before it has completed its handshake.
The sender thread reads from the queue in _unity_sender, which runs per-connection (created in _bridge_connection). When a connection ends and the sender thread terminates, queued items remain in the shared queue. The next time _bridge_connection spawns a sender, those leftovers are delivered immediately.
A simple fix is to drain the queue at the start of _bridge_connection:
def _bridge_connection(self, sock: socket.socket) -> None:
# Drain any stale messages from a previous session
while not self._send_queue.empty():
try:
self._send_queue.get_nowait()
except Empty:
break
...| def _on_terrain(self, cloud: PointCloud2) -> None: | ||
| points, _ = cloud.as_numpy() | ||
| if len(points) == 0: | ||
| return | ||
| dx = points[:, 0] - self._x | ||
| dy = points[:, 1] - self._y | ||
| near = points[np.sqrt(dx * dx + dy * dy) < 0.5] | ||
| if len(near) >= 10: | ||
| with self._state_lock: | ||
| self._terrain_z = 0.8 * self._terrain_z + 0.2 * near[:, 2].mean() |
There was a problem hiding this comment.
self._x / self._y read without lock in terrain callback
_on_terrain reads self._x and self._y at lines 429–430 directly (without _state_lock), while _sim_loop writes them at lines 612–613 in a separate thread. The GIL prevents true corruption for individual float reads/writes, but the pair (self._x, self._y) is not read atomically — _on_terrain could observe an _x from one tick and a _y from the next, producing a slightly off distance calculation for the terrain Z filter.
Consider snapping both values under _state_lock, or expanding _state_lock to also cover the position state as it already does for _terrain_z and _unity_connected:
def _on_terrain(self, cloud: PointCloud2) -> None:
points, _ = cloud.as_numpy()
if len(points) == 0:
return
with self._state_lock:
cur_x, cur_y = self._x, self._y
dx = points[:, 0] - cur_x
dy = points[:, 1] - cur_y
...
Problem
Testing g1 stuff is hard and the mujoco sim for the g1 is bad. (It would be nice to have automated tests using the unity sim from the ros nav stack)
Solution
Port the unity simulator as a DimOS module
Breaking Changes
None
How to Test
On linux x86 only:
Should download the unity simulator with a big message about it, then should open up the graphical sim window and a rerun window. Clicking won't navigate (its just a sim not the full nav stack)
Contributor License Agreement