OMPE-88188: Enable frame stacking for newton+newton_renderer by default#5232
OMPE-88188: Enable frame stacking for newton+newton_renderer by default#5232jmart-nv wants to merge 10 commits into
Conversation
kellyguo11
left a comment
There was a problem hiding this comment.
Review Summary
Good two-part PR that addresses a real sim2real gap: (1) disabling temporal AA (DLSS) by default so RL policies don't learn from renderer-specific temporal artifacts, and (2) adding frame stacking for Newton to explicitly provide temporal information that Newton's energy-conserving integrator needs.
The implementation is solid — ring buffer approach for frame stacking is clean, the MultiBackendTiledCameraCfg preset pattern is a nice API improvement over the previous renderer_cfg=MultiBackendRendererCfg() approach, and the test coverage is thorough (12+ new tests covering frame stacking, ring buffer correctness, reset behavior, and AA mode warnings).
A few performance and correctness issues worth addressing below.
RTX uses DLSS anti-aliasing by default. DLSS uses temporal frame blending that encodes motion information into a single RGB observation, causing camera-based RL to learn from temporal artifacts that don't exist with real cameras in the same way. This has been fixed by changing the default AA mode to FXAA, which is slower than DLSS, but more accurately simulates a real camera. Also added a runtime warning when DLSS, DLAA or TAA are enabled, as these all introduce temporal artifacts. Added new unit tests to verify expected behavior. Also tested with full cartpole-camera training run.
In the previous commit, DLSS anti-aliasing was disabled by default since it provides implicit temporal data that does not accurately reflect how real cameras work. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping. This commit provides explicit temporal information via 2-frame stacking by default for newton-based camera tasks. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible. The default for physx is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver. Implementation: `CameraCfg` now has a `frame_stack` field that controls a ring buffer of previous frames. These are concatenated to the channel dimension and the observation_space is auto-adjusted in `DirectRLEnv`. The new `MultiBackendCameraCfg` subclasses `CameraCfg` and pairs the `MultiBackendRendererCfg` with a `frame_stack` preset that defaults to 2 when `presets=newton` is used. `PhysicsCfg` gains a `requires_temporal_camera_data` capability flag (default False; True on `NewtonCfg`). When a backend that needs temporal data is paired with a camera whose `frame_stack <= 1`, a runtime warning is emitted from `sim_launcher.py`. This avoids hard-coding Newton-specific checks in the general API. Frame stacking uses a CPU-side flag to gate the per-step ring buffer init, avoiding a GPU->CPU sync from `.any()` on the steady-state path. Updated existing tasks (cartpole, shadow_hand, dexsuite kuka_allegro) to use `MultiBackendCameraCfg` and added 12 new tests.
Frame stacking now applies only to Newton+Warp by default. Other physics+renderer backends default to no stacking, but if it's detected that no temporal data is present (eg. using RTX with FXAA instead of DLSS) it will log a warning. Custom frame stack can be specified by overriding `frame_stack` on command line. Added new unit tests for capability-based defaults, preset parsing, cli override and warnings.
- Added sim.step() call before the first camera update to prime RTX via the visualizer pump path and ensure the first frame is not black. - Replaced assertion that frame pixels differ (incorrectly testing renderer behavior) with a better assertion that the newest slot in the ring buffer is properly updated.
- Replaced `torch.cat` + `.copy_(ordered)` with a per-slot `narrow().copy_()` loop. Eliminates the per-frame tensor allocation. - For multi-camera tasks, `direct_rl_env.py` collects every camera-like field with `frame_stack > 1` and raises if values disagree, rather than silently picking the first. Three new unit tests cover the single-camera, matching, and conflicting cases. - Moved stdlib `import logging` to top of test file. - Replaced three `sys.argv` try/finally blocks with `mock.patch.object(sys, "argv", ...)` context managers.
acd6bfe to
a798263
Compare
Greptile SummaryThis PR adds an explicit frame-stacking ring buffer to
Confidence Score: 4/5Safe to merge; ring-buffer logic, reset isolation, sentinel handling, and the warning predicate matrix are all correct, with one minor documentation inaccuracy in camera_cfg.py. The ring-buffer implementation deliberately avoids per-step allocations via narrow()+copy_(), but the docstring still says 'O(frame_stack) allocations' — directly contradicting the inline code comment at the call site. All functional logic is sound: history initialization on reset fills all slots with the first post-reset frame, partial resets isolate the correct env indices, the stacked output is built oldest-to-newest as expected, and the _apply_frame_stack_policies sentinel design cleanly separates user-supplied values from policy-driven defaults. source/isaaclab/isaaclab/sensors/camera/camera_cfg.py has the documentation inaccuracy. The rest of the changed files look correct. Important Files Changed
Sequence DiagramsequenceDiagram
participant CLI as CLI / resolve_task_config
participant LS as launch_simulation
participant AFP as _apply_frame_stack_policies
participant DRLE as DirectRLEnv.__init__
participant Cam as Camera._create_buffers
CLI->>LS: "env_cfg (frame_stack=0 sentinel)"
LS->>AFP: "walk cfg tree, set frame_stack=2 where newton+warp policy fires"
AFP-->>LS: frame_stack resolved on MultiBackendCameraCfg nodes
LS->>LS: _scan_config for needs_temporal / provides_temporal / has_unstacked_camera
alt needs_temporal AND NOT provides_temporal AND has_unstacked_camera
LS->>LS: logger.warning(frame_stack recommendation)
end
LS->>DRLE: env_cfg with frame_stack already resolved
DRLE->>DRLE: "scan cfg.__dataclass_fields__ for frame_stack > 1"
DRLE->>DRLE: "auto-adjust observation_space (H,W,C) to (H,W,C*frame_stack)"
DRLE->>Cam: "Camera(cfg) with frame_stack=2"
Cam->>Cam: alloc _frame_history shape (stack, N, H, W, C)
Cam->>Cam: "alloc _stacked_output shape (N, H, W, C*stack)"
Cam->>Cam: "_data.output = _stacked_output, renderer writes to _single_frame_output"
loop each step
Cam->>Cam: renderer writes new frame to _single_frame_output
Cam->>Cam: history[idx].copy_(single), then narrow+copy_ to _stacked_output
end
Reviews (1): Last reviewed commit: "OMPE-88188: Fix precommit errors." | Re-trigger Greptile |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jmart-nv <jemartinez@nvidia.com>
| # observation_space implies a single camera drives the policy obs; we collect | ||
| # every camera-like field with ``frame_stack > 1`` and verify they all share | ||
| # the same value before adjusting, rather than silently picking the first. | ||
| if isinstance(self.cfg.observation_space, (list, tuple)) and len(self.cfg.observation_space) == 3: |
There was a problem hiding this comment.
I don't think this should belong in the base class. the stacking mechanism should be on a per-task implementation. I would add this specifically for the cartpole environment
| class_type: type[PhysicsManager] | Any = MISSING | ||
| """The physics manager class to use. Must be set by subclasses.""" | ||
|
|
||
| requires_temporal_camera_data: bool = False |
There was a problem hiding this comment.
I think this probably also doesn't belong in the physics manager cfg. there's no reason the physics manager should care about camera data
| self._frame[env_ids] = 0 | ||
| # Reset frame-stack history for these envs; next _update_buffers_impl fills all | ||
| # history slots with the first post-reset frame so the policy never sees stale frames. | ||
| if self._frame_stack_size > 1: |
There was a problem hiding this comment.
I would recommend handling the frame stacking logic in the cartpole environment directly
| return isinstance(node, PhysicsCfg) and type(node).__name__ in ("NewtonCfg", "OvPhysxCfg") | ||
|
|
||
|
|
||
| def _physics_needs_temporal_camera_data(node) -> bool: |
There was a problem hiding this comment.
these are probably also not necessary if we move the logic into the task class directly?
|
Good Job figuring this out @jmart-nv I 100% agree with the motivation and surprised by your finding. Implementation wise, similar to Kelly's advise, I’d prefer this stay task-local for now. The current PR makes frame stacking a framework-level concern across the camera sensor, base env, physics config, renderer capabilities, presets, and launcher. That feels too broad for behavior currently motivated by specific Newton camera tasks. In particular, I don’t think DirectRLEnv should auto-adjust observation_space from camera config fields. The task owns the observation layout, so it should explicitly stack frames and declare the resulting observation shape. If this becomes common across tasks, we can promote a shared helper later with clearer boundaries. For example, for CartpoleCameraEnv, I’d rather see something like a task config field: and then handle the history directly in CartpoleCameraEnv._get_observations(): Then _get_observations() can remain explicit: Reset handling can also stay task-local by clearing or reinitializing the history for env_ids in _reset_idx(). That keeps the observation contract, memory cost, reset behavior, and Newton-specific policy choice visible in the environment that actually needs it. I’d also document the Newton/Warp renderer class subtlety directly: unlike RTX modes with temporal AA, the Warp renderer does not inject prior-frame information into the current image. Camera-control tasks that need velocity-like visual cues should therefore add explicit temporal observations, such as task-local frame stacking, rather than relying on renderer-specific artifacts. Feel free to give my comment to agent and and have it work on next iteration x) |
|
Closing this PR in favor of per-task frame stacking |
# Description This enables frame stacking for newton+warp by default in the cartpole camera presets task. **Newton Frame Stacking:** _Provides explicit temporal data to newton._ RTX uses DLSS anti-aliasing by default, which provides implicit temporal data. The newton_renderer does not provide temporal data. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping, which causes convergence problems when paired with newton_renderer. This commit provides explicit temporal information via 2-frame stacking by default for cartpole-camera when using newton+newton_renderer. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible. The default for all other physics/renderer backends is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver, and RTX provides temporal data implicitly. **Implementation:** For manager-based envs, a new `stacked_image` term is added to the MDP observations - tasks can opt into frame stacking by adding the `stacked_image` term to their observation cfg and setting `frame_stack` to a value > 1. The cartpole camera presets direct env now implements frame stacking using `CircularBuffer` from `isaaclab.utils.buffers` directly in `_get_observations`. Added new unit tests for the MDP term (mocked + `ObservationManager` E2E) and cartpole integration, and updated the documentation with a note about newton's dependency on temporal data. _Note: This is a task-local re-implementation of the closed [PR #5232](#5232 ## Type of change - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Description
This enables frame stacking for newton+warp by default.
Previously this PR also included changes to RTX's default AA, which have been reverted. This was intended to remove the implicit temporal information introduced by DLSS and TAA. Instead, we now focus on explicitly adding temporal data via frame stacking where needed.
Newton Frame Stacking: Provides explicit temporal data to newton.
RTX uses DLSS anti-aliasing by default, which provides implicit temporal data. The newton_renderer does not provide temporal data. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping, which causes convergence problems when paired with newton_renderer.
This commit provides explicit temporal information via 2-frame stacking by default for camera tasks that use newton+newton_renderer. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible.
The default for all other physics/renderer backends is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver, and RTX provides temporal data implicitly.
Implementation:
CameraCfg now has a
frame_stackfield that controls a ring buffer of previous frames. These are concatenated to the channel dimension automatically inDirectRLEnv. The newMultiBackendCameraCfgwraps theMultiBackendRendererCfgto provide defaults for the different presets, driven by nested config classes and capability flags. Warnings will be logged if it detects that a physics backend that needs explicit temporal data is paired with a renderer that does not provide it.Updated existing tasks to use the new MultiBackendCameraCfg and added 65 new unit tests.
Type of change
Screenshots
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there