Conversation
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
This PR refactors Rerun visualization architecture to use polling-based TF snapshots and removes async background threads from mappers. The changes address issue #956 by simplifying the visualization pipeline and improving maintainability.
Key changes:
- TF visualization now polls the TF buffer at 30Hz (configurable) instead of subscribing to
/tfLCM topic directly, providing rate-limited stable updates - GO2Connection no longer manually composes and logs transforms to Rerun; robot and camera motion is now driven entirely through TF system attachments to named frames (
base_link,camera_optical) - TF visualization logs are isolated under
world/tf/{child}paths to separate debugging views from semantic entity paths - Removed async queue/thread patterns from
CostMapperandVoxelGridMapper; Rerun logging is now synchronous (performance controlled via voxel size tuning) - Added comprehensive documentation covering all Rerun usage, TF conventions, and entity path organization
Issues found:
VoxelGridMapperhas unguardedrr.log()calls for metrics (lines 177-179) that will fail when Rerun backend is not activeTFRerunModuleaccesses internalself.tf.buffersattribute; consider documenting this as intentional or using a public API
Confidence Score: 4/5
- This PR is safe to merge with one critical bug fix needed for non-Rerun backends
- The refactoring simplifies the architecture and is well-documented, but contains one logic bug in VoxelGridMapper where metrics logging is unguarded and will crash when Rerun backend is not selected. The architectural changes are sound and remove complexity, but the unguarded rr.log() calls need fixing before deployment.
dimos/mapping/voxels.pyrequires immediate attention for the unguarded metrics logging bug
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| dimos/dashboard/tf_rerun_module.py | 4/5 | Replaced LCM subscription with polling-based TF buffer snapshot at configurable rate (30Hz default), logs to world/tf/{child} paths |
| dimos/robot/unitree/connection/go2.py | 5/5 | Removed manual TF/transform logging from odometry callback; robot/camera motion now driven entirely via TF system attachments |
| dimos/mapping/costmapper.py | 4/5 | Removed async Rerun queue/thread; switched to synchronous logging with 2D image panel added to views |
| dimos/mapping/voxels.py | 4/5 | Removed async Rerun queue/thread; switched to synchronous logging directly in data pipeline |
Sequence Diagram
sequenceDiagram
participant GO2 as GO2Connection
participant TFSys as TF System (self.tf)
participant TFBuf as TF Buffer (MultiTBuffer)
participant TFViz as TFRerunModule
participant RR as Rerun
Note over GO2,RR: Initialization Phase
GO2->>RR: Log static world coordinates
GO2->>RR: Log URDF at world/robot
GO2->>RR: Attach world → world frame
GO2->>RR: Attach world/robot → base_link frame
GO2->>RR: Attach world/robot/camera → camera_optical frame
GO2->>RR: Log static camera pinhole
TFViz->>TFSys: start(sub=True)
TFSys->>TFBuf: Subscribe to /tf topic
TFViz->>TFViz: Start polling thread (30Hz)
Note over GO2,RR: Runtime Phase (per odometry update)
GO2->>GO2: Receive odometry message
GO2->>TFSys: publish(world→base_link, base_link→camera_link, camera_link→camera_optical)
TFSys->>TFBuf: Store transforms in buffer
TFSys->>TFSys: Publish TFMessage to /tf topic
GO2->>RR: Log camera image at world/robot/camera/rgb
Note over TFViz,RR: TF Visualization Loop (30Hz polling)
loop Every 33ms
TFViz->>TFBuf: Snapshot self.tf.buffers.items()
loop For each (parent, child) edge
TFViz->>TFBuf: buffer.get() - get latest transform
alt New or updated transform
TFViz->>RR: rr.log(f"world/tf/{child}", transform.to_rerun())
TFViz->>TFViz: Update _last_ts_by_edge[(parent, child)]
else Same timestamp
TFViz->>TFViz: Skip (no change)
end
end
end
Note over GO2,RR: Result
Note right of RR: world/robot moves via TF attachment to base_link<br/>world/robot/camera moves via TF attachment to camera_optical<br/>world/tf/* shows transform tree for debugging
dimos/dashboard/tf_rerun_module.py
Outdated
| while not self._stop_event.is_set(): | ||
| try: | ||
| # Snapshot keys to avoid concurrent modification while TF buffer updates. | ||
| items = list(self.tf.buffers.items()) # type: ignore[attr-defined] |
There was a problem hiding this comment.
Accessing self.tf.buffers directly relies on implementation details of the TF service. Consider using a public API if available, or add a comment explaining this is intentional internal access.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
What greptile wrote makes sense, but I'm not sure what the proper fix for this is.
Maybe the best way would be to add a get_buffers method in MultiTBuffer which returns a clone, but such a method would require a lock as well and none of the code there uses locks, so it's a big change. But it looks like the only way.
There was a problem hiding this comment.
You're absolutely right that adding a get_buffers() method would be the cleanest solution, but as you noted, it would require introducing locking to the MultiTBuffer class, which is a significant change.
Looking at the current code, there are a few interim approaches you could take:
- Document the intentional access with a detailed comment:
| items = list(self.tf.buffers.items()) # type: ignore[attr-defined] | |
| # INTENTIONAL: Direct access to self.tf.buffers internal state. | |
| # This is necessary to snapshot the current TF buffer state for visualization. | |
| # TODO: Consider adding a public get_buffers() method to MultiTBuffer with proper locking. | |
| items = list(self.tf.buffers.items()) # type: ignore[attr-defined] |
- Wrap the access in a try-catch to handle potential concurrent modification issues:
| items = list(self.tf.buffers.items()) # type: ignore[attr-defined] | |
| try: | |
| # Snapshot keys to avoid concurrent modification while TF buffer updates. | |
| # Note: Direct buffer access - consider adding public API in future | |
| items = list(self.tf.buffers.items()) # type: ignore[attr-defined] | |
| except RuntimeError: | |
| # Handle potential concurrent modification during iteration | |
| logger.debug("TFRerunModule: concurrent buffer modification, skipping poll") | |
| continue |
- Add a method to the TF service that returns a safe snapshot (if you control that interface).
Given that this is visualization code that can tolerate occasional missed updates, option 2 might be the most pragmatic short-term solution while you plan the proper locking implementation for MultiTBuffer.
What's your preference? Would you like me to create a rule to remember this pattern for future reviews?
There was a problem hiding this comment.
What greptile wrote makes sense, but I'm not sure what the proper fix for this is.
Maybe the best way would be to add a
get_buffersmethod inMultiTBufferwhich returns a clone, but such a method would require a lock as well and none of the code there uses locks, so it's a big change. But it looks like the only way.
I am also unsure, is there a public API that is available? before I was doing some hacky stuff but this may be equally hacky.
Additional Comments (1)
|
| # or explicitly: | ||
|
|
||
| # Explicitly select the viewer backend: | ||
| dimos --viewer-backend rerun-native run unitree-go2 |
dimos/dashboard/tf_rerun_module.py
Outdated
| except Exception as e: | ||
| logger.warning("TFRerunModule: TF poll loop error", error=str(e)) |
There was a problem hiding this comment.
What exception are you expecting here? It's probably not a good idea to wrap things in try-except just in case.
There was a problem hiding this comment.
will remove, no good reason indeed for being there.
| video_store = TimedSensorReplay(f"{self.dir_name}/video") # type: ignore[var-annotated] | ||
| # Legacy Unitree recordings can have RGB bytes that were tagged/assumed as BGR. | ||
| # Fix at replay-time by coercing everything to RGB before publishing/logging. | ||
| def _autocast_video(x): # type: ignore[no-untyped-def] |
4d3438d to
663e48a
Compare
|
Too many files changed for review. |
663e48a to
eb7e411
Compare
|
Too many files changed for review. |
1 similar comment
|
Too many files changed for review. |
eb7e411 to
6699292
Compare
|
Too many files changed for review. |
1 similar comment
|
Too many files changed for review. |
TF visualization now uses snapshot polling (self.tf.buffers) instead of
LCM subscriptions.
Changes:
- TFRerunModule: poll TF buffers at configurable Hz, log changed transforms
- CostMapper: remove _rerun_worker thread/queue, log synchronously
- VoxelGridMapper: remove _rerun_worker thread/queue, log synchronously
- GO2Connection: remove manual transform logging (robot/camera poses),
keep only static axes + pinhole; TF module handles dynamic transforms
- TFMessage.to_rerun: log to 'world/tf/{child}' instead of 'world/{child}'
The queue/thread patterns were removed because:
- costmapper is O(n) and voxel_size tuning addresses perf, not async magic
- threads added complexity without benefit for visualization logging
Former-commit-id: 9c14a09 [formerly 6128ee1]
Former-commit-id: f7b2453
|
who do we still have rerun transforms manually published in go2 connection.py? go2 should do go2 stuff, rerun transform module should do rerun transforms stuff # Attach semantic entity paths to the named TF frame graph without explicitly
# introducing `tf#/...` strings into the code.
#
# - `world` entity path: its implicit frame (tf#/world) is a child of the named frame "world".
# - `world/robot` entity path: its implicit frame (tf#/world/robot) is a child of the named frame "base_link".
rr.log(
"world",
rr.Transform3D(
translation=[0.0, 0.0, 0.0],
rotation=rr.Quaternion(xyzw=[0.0, 0.0, 0.0, 1.0]),
parent_frame="world", # type: ignore[call-arg]
),
static=True,
)
rr.log(
"world/robot",
rr.Transform3D(
translation=[0.0, 0.0, 0.0],
rotation=rr.Quaternion(xyzw=[0.0, 0.0, 0.0, 1.0]),
parent_frame="base_link", # type: ignore[call-arg]
),
static=True,
)
# Load robot URDF
if _GO2_URDF.exists():
rr.log_file_from_path(
str(_GO2_URDF),
entity_path_prefix="world/robot",
static=True,
)
logger.info(f"Loaded URDF from {_GO2_URDF}") |
- Add RerunSceneWiringModule for static Rerun setup (view coords, entity attachments, URDF, axes, camera pinholes) - Refactor tf_rerun() to compose TF polling + scene wiring via autoconnect - Remove _init_rerun_world() from GO2Connection (now only TF + sensor data) - Update GO2 blueprint to pass URDF + camera config via tf_rerun(...) - Update VIEWER_BACKENDS.md to document the new architecture
|
This new commit moves most of the transform logic out of go2, this had to do with the fact that rerun expects an entity to be defined together with a rerun primitive in order to log it. Ended up in go2 connection because a lot of stuff came together there. Moved it to blueprint instead. Usage in blueprint now looks like: basic = autoconnect(
go2_connection(),
linux if platform.system() == "Linux" else mac,
websocket_vis(),
tf_rerun(
urdf_path=str(_GO2_URDF),
cameras=[
("world/robot/camera", "camera_optical", GO2Connection.camera_info_static),
],
),
).global_config(n_dask_workers=4, robot_model="unitree_go2")This pattern will allow us to attach multiple cameras to a robot on rerun as well in the future |
580a60b to
63e81a7
Compare
This PR addresses issue #956.
Changes:
TF visualization now polls the TF buffer (self.tf.buffers) at a configurable rate and logs a snapshot of the latest transforms. No direct LCM subscription inside the TF viz module.
GO2Connection no longer manually logs robot/camera transforms to Rerun (no composing transforms, no hand-built rr.Transform3D). It only publishes TF via self.tf.publish(*transforms) and logs static assets (URDF, axes, camera pinhole) + camera images.
TF logs are written under world/tf/{child_frame} so they’re visible under the default world origin without mixing semantic entity paths with TF.
Removed queue/thread “async rerun worker” patterns from CostMapper and VoxelGridMapper; logging is synchronous again. Freezing/perf is expected to be handled by voxel size (O(n) costmap/mesh generation), not background threads.
Added/updated documentation: inventory of all Rerun usage in the codebase and a short “how to use Rerun on dev” section with TF/entity path conventions.
Notes:
URDF loading remains (intentionally) for future robot model usage.
Camera distortion: Rerun pinhole uses intrinsics; distortion must be handled by undistorting images upstream if needed.