Fix OVRTX renderer device mismatch on multi-GPU#5594
Conversation
Replace hardcoded DEVICE = "cuda:0" in ovrtx_renderer with the actual sim device from CameraRenderSpec.device. All Warp kernel launches and buffer allocations now target the correct GPU when sim.device is not cuda:0 (e.g. distributed training).
Greptile SummaryThis PR fixes
Confidence Score: 5/5Safe to merge — all hardcoded GPU-0 references are systematically replaced and the two issues flagged in the previous review round are fully addressed. The change is a focused, mechanical substitution: every No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[OVRTXRenderer.__init__] -->|self._device = 'cuda:0'| B[Default device set]
C[create_render_data spec] -->|self._device = spec.device| D[Device updated from spec]
D --> E{_initialized_scene?}
E -->|No| F[_initialize_from_spec]
F --> G[_setup_object_bindings\nwp.array device=self._device]
E -->|Yes| H[OVRTXRenderData spec, self._device]
G --> H
I[_device_id property] -->|self._device.split ':'| J[Returns int CUDA index]
K[update_transforms] -->|object_binding.map device_id=self._device_id| L[wp.from_dlpack]
L -->|wp.launch device=self._device| M[sync_newton_transforms_kernel]
N[update_camera] -->|wp.zeros device=self._device| O[camera_transforms]
O -->|wp.launch device=self._device| P[create_camera_transforms_kernel]
P -->|camera_binding.map device_id=self._device_id| Q[wp.copy to binding]
R[_process_render_frame] -->|LdrColor.map device_id=self._device_id| S[extract_rgba_tiles\ndevice=self._device]
R -->|DepthSD.map device_id=self._device_id| T[extract_depth_tiles\ndevice=self._device]
R -->|DiffuseAlbedoSD.map device_id=self._device_id| U[extract_rgba_tiles albedo\ndevice=self._device]
R -->|SemanticSegmentation.map device_id=self._device_id| V[extract_rgba_tiles semantic\ndevice=self._device]
Reviews (2): Last reviewed commit: "Update kernel tests to use local device ..." | Re-trigger Greptile |
Route all OVRTX binding.map() calls through _device_id property so DLPack tensors are mapped on the correct CUDA device. Without this, update_transforms, update_camera, and render output mapping pin buffers to cuda:0 regardless of sim.device.
The DEVICE constant was removed from ovrtx_renderer_kernels in the previous commit. Define it locally in the test module instead.
|
@greptile retrigger |
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot - Incremental Review
Reviewed commits: 044d6b5 → 980cd22
Summary of Changes
This update simplifies the multi-GPU device handling fix in _process_render_frame():
Change Details
Removed explicit device_id from .map() calls:
frame.render_vars["LdrColor"].map()— removeddevice_id=self._device_idframe.render_vars[depth_var].map()— removeddevice_id=self._device_idframe.render_vars["DiffuseAlbedoSD"].map()— removeddevice_id=self._device_idframe.render_vars["SemanticSegmentation"].map()— removeddevice_id=self._device_id
Analysis
This is a refinement of the original multi-GPU fix. The binding.map() calls inside _process_render_frame() previously used hardcoded device_id=0. The initial fix added device_id=self._device_id, but this update removes the parameter entirely.
This simplification is likely intentional because:
- The frame render variables already exist on the correct device from the renderer initialization
- Omitting
device_idlets OVRTX use its default device inference - The explicit
device_idis still retained for_camera_binding.map()and_object_binding.map()inupdate_transforms()andupdate_camera()where it matters for data flow
Code Quality Assessment
The simplification is appropriate:
- Removes unnecessary parameter where device context is already established
- Keeps explicit device control where it matters (camera/object bindings)
- Cleaner code with no functional regression expected
Verdict
The changes look good. This is a sensible simplification of the multi-GPU fix. No blocking concerns.
Last reviewed SHA: 980cd22469e7043e7f261fa314774b78098983c5
|
@fatimaanes seeing some failures in CI tests - https://github.com/isaac-sim/IsaacLab/actions/runs/25935794523/job/76246703492?pr=5594 |
RenderVarOutput.map() only accepts (device, sync_stream) per the OVRTX API (ovrtx_map_output_description_t has device_type + sync_stream). AttributeBinding.map() accepts (device, device_id) per ovrtx_mapping_desc_t. The previous commit incorrectly added device_id to both call types. This reverts the 4 RenderVarOutput.map() calls to their original form while keeping device_id on the 2 AttributeBinding.map() calls. Co-Authored-By: Fatima Anes <fanes@nvidia.com>
Description
Fixes
OVRTXRenderercrash on multi-GPU systems whensim.deviceis notcuda:0.Root cause: A hardcoded
DEVICE = "cuda:0"constant inovrtx_renderer_kernels.pywas imported and used for all Warp kernel launches and buffer allocations. Additionally,AttributeBinding.map()calls useddevice_id=0, pinning attribute mapping to GPU 0 regardless of the simulation device.Fix:
DEVICEconstant and useself._device(set fromCameraRenderSpec.device) for all Warp operations (11 locations)_device_idproperty to extract the CUDA device index from the device stringdevice_id=self._device_idtoAttributeBinding.map()calls (2 locations: object binding and camera binding)Note on
RenderVarOutput.map()calls: These remain unchanged (device=Device.CUDAonly) because the OVRTX C API for render output mapping (ovrtx_map_output_description_t) does not accept adevice_idparameter — the output is inherently mapped on whichever GPU OVRTX rendered on.Total: 13 hardcoded GPU-0 references fixed (11 Warp + 2 AttributeBinding).
This is the same bug class fixed for
NewtonRendererin #5019 — OVRTX was not updated at that time.Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatCONTRIBUTORS.mdor my organization to theCONTRIBUTORS.mdlist