OMPE-88188: Enable frame stacking for newton+newton_renderer by default by jmart-nv · Pull Request #5232 · isaac-sim/IsaacLab

jmart-nv · 2026-04-10T18:33:05Z

Description

This enables frame stacking for newton+warp by default.

Previously this PR also included changes to RTX's default AA, which have been reverted. This was intended to remove the implicit temporal information introduced by DLSS and TAA. Instead, we now focus on explicitly adding temporal data via frame stacking where needed.

Newton Frame Stacking: Provides explicit temporal data to newton.

RTX uses DLSS anti-aliasing by default, which provides implicit temporal data. The newton_renderer does not provide temporal data. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping, which causes convergence problems when paired with newton_renderer.

This commit provides explicit temporal information via 2-frame stacking by default for camera tasks that use newton+newton_renderer. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible.

The default for all other physics/renderer backends is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver, and RTX provides temporal data implicitly.

Implementation:

CameraCfg now has a frame_stack field that controls a ring buffer of previous frames. These are concatenated to the channel dimension automatically in DirectRLEnv. The new MultiBackendCameraCfg wraps the MultiBackendRendererCfg to provide defaults for the different presets, driven by nested config classes and capability flags. Warnings will be logged if it detects that a physics backend that needs explicit temporal data is paired with a renderer that does not provide it.

Updated existing tasks to use the new MultiBackendCameraCfg and added 65 new unit tests.

Type of change

Bug fix (non-breaking change which fixes an issue)

Screenshots

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings (N/A - new warnings are intentional)
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

kellyguo11

Review Summary

Good two-part PR that addresses a real sim2real gap: (1) disabling temporal AA (DLSS) by default so RL policies don't learn from renderer-specific temporal artifacts, and (2) adding frame stacking for Newton to explicitly provide temporal information that Newton's energy-conserving integrator needs.

The implementation is solid — ring buffer approach for frame stacking is clean, the MultiBackendTiledCameraCfg preset pattern is a nice API improvement over the previous renderer_cfg=MultiBackendRendererCfg() approach, and the test coverage is thorough (12+ new tests covering frame stacking, ring buffer correctness, reset behavior, and AA mode warnings).

A few performance and correctness issues worth addressing below.

RTX uses DLSS anti-aliasing by default. DLSS uses temporal frame blending that encodes motion information into a single RGB observation, causing camera-based RL to learn from temporal artifacts that don't exist with real cameras in the same way. This has been fixed by changing the default AA mode to FXAA, which is slower than DLSS, but more accurately simulates a real camera. Also added a runtime warning when DLSS, DLAA or TAA are enabled, as these all introduce temporal artifacts. Added new unit tests to verify expected behavior. Also tested with full cartpole-camera training run.

In the previous commit, DLSS anti-aliasing was disabled by default since it provides implicit temporal data that does not accurately reflect how real cameras work. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping. This commit provides explicit temporal information via 2-frame stacking by default for newton-based camera tasks. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible. The default for physx is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver. Implementation: `CameraCfg` now has a `frame_stack` field that controls a ring buffer of previous frames. These are concatenated to the channel dimension and the observation_space is auto-adjusted in `DirectRLEnv`. The new `MultiBackendCameraCfg` subclasses `CameraCfg` and pairs the `MultiBackendRendererCfg` with a `frame_stack` preset that defaults to 2 when `presets=newton` is used. `PhysicsCfg` gains a `requires_temporal_camera_data` capability flag (default False; True on `NewtonCfg`). When a backend that needs temporal data is paired with a camera whose `frame_stack <= 1`, a runtime warning is emitted from `sim_launcher.py`. This avoids hard-coding Newton-specific checks in the general API. Frame stacking uses a CPU-side flag to gate the per-step ring buffer init, avoiding a GPU->CPU sync from `.any()` on the steady-state path. Updated existing tasks (cartpole, shadow_hand, dexsuite kuka_allegro) to use `MultiBackendCameraCfg` and added 12 new tests.

Frame stacking now applies only to Newton+Warp by default. Other physics+renderer backends default to no stacking, but if it's detected that no temporal data is present (eg. using RTX with FXAA instead of DLSS) it will log a warning. Custom frame stack can be specified by overriding `frame_stack` on command line. Added new unit tests for capability-based defaults, preset parsing, cli override and warnings.

- Added sim.step() call before the first camera update to prime RTX via the visualizer pump path and ensure the first frame is not black. - Replaced assertion that frame pixels differ (incorrectly testing renderer behavior) with a better assertion that the newest slot in the ring buffer is properly updated.

- Replaced `torch.cat` + `.copy_(ordered)` with a per-slot `narrow().copy_()` loop. Eliminates the per-frame tensor allocation. - For multi-camera tasks, `direct_rl_env.py` collects every camera-like field with `frame_stack > 1` and raises if values disagree, rather than silently picking the first. Three new unit tests cover the single-camera, matching, and conflicting cases. - Moved stdlib `import logging` to top of test file. - Replaced three `sys.argv` try/finally blocks with `mock.patch.object(sys, "argv", ...)` context managers.

greptile-apps · 2026-05-06T21:23:28Z

Greptile Summary

This PR adds an explicit frame-stacking ring buffer to CameraCfg/Camera and wires a two-level preset policy (MultiBackendCameraCfg + FrameStackPolicyCfg) that automatically enables frame_stack=2 for the Newton+Warp backend combination, while emitting a warning when physics requires temporal data and no stacking or temporal-AA renderer is present.

Frame-stacking in Camera: _create_buffers pre-allocates _frame_history and _stacked_output once; each step advances the ring buffer using narrow()+copy_() with no per-step allocations, and resets correctly preserve per-env history isolation.
Policy infrastructure: MultiBackendCameraCfg carries a sentinel frame_stack=0; launch_simulation calls _apply_frame_stack_policies to resolve the nested FrameStackPolicyCfg tree before scanning for the temporal-data warning, ensuring post-policy state is what the warning reflects.
DirectRLEnv obs-space adjustment: A shallow scan of cfg.__dataclass_fields__ auto-multiplies the channel dimension for a 3-tuple observation_space, raising ValueError on conflicting stacks across multiple cameras.

Confidence Score: 4/5

Safe to merge; ring-buffer logic, reset isolation, sentinel handling, and the warning predicate matrix are all correct, with one minor documentation inaccuracy in camera_cfg.py.

The ring-buffer implementation deliberately avoids per-step allocations via narrow()+copy_(), but the docstring still says 'O(frame_stack) allocations' — directly contradicting the inline code comment at the call site. All functional logic is sound: history initialization on reset fills all slots with the first post-reset frame, partial resets isolate the correct env indices, the stacked output is built oldest-to-newest as expected, and the _apply_frame_stack_policies sentinel design cleanly separates user-supplied values from policy-driven defaults.

source/isaaclab/isaaclab/sensors/camera/camera_cfg.py has the documentation inaccuracy. The rest of the changed files look correct.

Important Files Changed

Filename	Overview
source/isaaclab/isaaclab/sensors/camera/camera.py	Core ring-buffer implementation for frame stacking: correct initialization, reset, and per-step copy logic using narrow() views to avoid allocations
source/isaaclab/isaaclab/sensors/camera/camera_cfg.py	Adds frame_stack field with default=1; docstring claims O(frame_stack) allocations per step but implementation uses allocation-free narrow()+copy_()
source/isaaclab/isaaclab/envs/direct_rl_env.py	Adds obs-space auto-adjustment for frame stacking with conflict detection; shallow one-level scan is consistent with the 3-tuple single-camera convention
source/isaaclab_tasks/isaaclab_tasks/utils/sim_launcher.py	Adds _apply_frame_stack_policies, _resolve_frame_stack_policy, and the warning logic; traversal and sentinel handling look correct
source/isaaclab_tasks/isaaclab_tasks/utils/presets.py	Adds MultiBackendCameraCfg with sentinel frame_stack=0 and nested FrameStackPolicyCfg tree; two-level PresetCfg design for AND-conditioned physics×renderer selection is well-structured
source/isaaclab/test/sensors/test_camera_frame_stack.py	835-line test suite covering ring-buffer shape, init, reset, partial reset, renderer capability flags, warning matrix, policy truth table, CLI overrides, and end-to-end propagation
source/isaaclab_newton/isaaclab_newton/renderers/newton_warp_renderer_cfg.py	Adds provides_temporal_camera_data override returning False for pure-rasterization Warp renderer
source/isaaclab_physx/isaaclab_physx/renderers/isaac_rtx_renderer_cfg.py	Adds provides_temporal_camera_data returning True for DLSS/DLAA/TAA and False for FXAA/Off; correctly falls back to True when antialiasing_mode is None (DLSS default)

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / resolve_task_config
    participant LS as launch_simulation
    participant AFP as _apply_frame_stack_policies
    participant DRLE as DirectRLEnv.__init__
    participant Cam as Camera._create_buffers

    CLI->>LS: "env_cfg (frame_stack=0 sentinel)"
    LS->>AFP: "walk cfg tree, set frame_stack=2 where newton+warp policy fires"
    AFP-->>LS: frame_stack resolved on MultiBackendCameraCfg nodes
    LS->>LS: _scan_config for needs_temporal / provides_temporal / has_unstacked_camera
    alt needs_temporal AND NOT provides_temporal AND has_unstacked_camera
        LS->>LS: logger.warning(frame_stack recommendation)
    end
    LS->>DRLE: env_cfg with frame_stack already resolved
    DRLE->>DRLE: "scan cfg.__dataclass_fields__ for frame_stack > 1"
    DRLE->>DRLE: "auto-adjust observation_space (H,W,C) to (H,W,C*frame_stack)"
    DRLE->>Cam: "Camera(cfg) with frame_stack=2"
    Cam->>Cam: alloc _frame_history shape (stack, N, H, W, C)
    Cam->>Cam: "alloc _stacked_output shape (N, H, W, C*stack)"
    Cam->>Cam: "_data.output = _stacked_output, renderer writes to _single_frame_output"
    loop each step
        Cam->>Cam: renderer writes new frame to _single_frame_output
        Cam->>Cam: history[idx].copy_(single), then narrow+copy_ to _stacked_output
    end

_{Reviews (1): Last reviewed commit: "OMPE-88188: Fix precommit errors." | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jmart-nv <jemartinez@nvidia.com>

kellyguo11 · 2026-05-08T03:43:37Z

+        # observation_space implies a single camera drives the policy obs; we collect
+        # every camera-like field with ``frame_stack > 1`` and verify they all share
+        # the same value before adjusting, rather than silently picking the first.
+        if isinstance(self.cfg.observation_space, (list, tuple)) and len(self.cfg.observation_space) == 3:


I don't think this should belong in the base class. the stacking mechanism should be on a per-task implementation. I would add this specifically for the cartpole environment

kellyguo11 · 2026-05-08T03:44:27Z

    class_type: type[PhysicsManager] | Any = MISSING
    """The physics manager class to use. Must be set by subclasses."""
+
+    requires_temporal_camera_data: bool = False


I think this probably also doesn't belong in the physics manager cfg. there's no reason the physics manager should care about camera data

kellyguo11 · 2026-05-08T03:45:25Z

        self._frame[env_ids] = 0
+        # Reset frame-stack history for these envs; next _update_buffers_impl fills all
+        # history slots with the first post-reset frame so the policy never sees stale frames.
+        if self._frame_stack_size > 1:


I would recommend handling the frame stacking logic in the cartpole environment directly

kellyguo11 · 2026-05-08T03:46:28Z

    return isinstance(node, PhysicsCfg) and type(node).__name__ in ("NewtonCfg", "OvPhysxCfg")


+def _physics_needs_temporal_camera_data(node) -> bool:


these are probably also not necessary if we move the logic into the task class directly?

ooctipus · 2026-05-08T04:09:52Z

Good Job figuring this out @jmart-nv I 100% agree with the motivation and surprised by your finding. Implementation wise, similar to Kelly's advise, I’d prefer this stay task-local for now. The current PR makes frame stacking a framework-level concern across the camera sensor, base env, physics config, renderer capabilities, presets, and launcher. That feels too broad for behavior currently motivated by specific Newton camera tasks.

In particular, I don’t think DirectRLEnv should auto-adjust observation_space from camera config fields. The task owns the observation layout, so it should explicitly stack frames and declare the resulting observation shape. If this becomes common across tasks, we can promote a shared helper later with clearer boundaries.

For example, for CartpoleCameraEnv, I’d rather see something like a task config field:

camera_frame_stack: int = 1
observation_space = [tiled_camera.height, tiled_camera.width, 3 * camera_frame_stack]

and then handle the history directly in CartpoleCameraEnv._get_observations():

def _stack_camera_obs(self, camera_data: torch.Tensor) -> torch.Tensor:
    frame_stack = self.cfg.camera_frame_stack
    if frame_stack <= 1:
        return camera_data
    if self._camera_frame_history is None:
        self._camera_frame_history = camera_data.unsqueeze(0).repeat(frame_stack, 1, 1, 1, 1)
        self._camera_frame_stack_idx = 0
    self._camera_frame_history[self._camera_frame_stack_idx].copy_(camera_data)
    stacked = torch.cat(
        [
            self._camera_frame_history[(self._camera_frame_stack_idx + 1 + i) % frame_stack]
            for i in range(frame_stack)
        ],
        dim=-1,
    )
    self._camera_frame_stack_idx = (self._camera_frame_stack_idx + 1) % frame_stack
    return stacked

Then _get_observations() can remain explicit:

camera_data = self._tiled_camera.data.output[data_type] / 255.0
camera_data -= torch.mean(camera_data, dim=(1, 2), keepdim=True)
camera_data = self._stack_camera_obs(camera_data)
return {"policy": camera_data.clone()}

Reset handling can also stay task-local by clearing or reinitializing the history for env_ids in _reset_idx(). That keeps the observation contract, memory cost, reset behavior, and Newton-specific policy choice visible in the environment that actually needs it.

I’d also document the Newton/Warp renderer class subtlety directly: unlike RTX modes with temporal AA, the Warp renderer does not inject prior-frame information into the current image. Camera-control tasks that need velocity-like visual cues should therefore add explicit temporal observations, such as task-local frame stacking, rather than relying on renderer-specific artifacts.

Feel free to give my comment to agent and and have it work on next iteration x)

jmart-nv · 2026-05-08T16:26:05Z

Closing this PR in favor of per-task frame stacking

# Description This enables frame stacking for newton+warp by default in the cartpole camera presets task. **Newton Frame Stacking:** _Provides explicit temporal data to newton._ RTX uses DLSS anti-aliasing by default, which provides implicit temporal data. The newton_renderer does not provide temporal data. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping, which causes convergence problems when paired with newton_renderer. This commit provides explicit temporal information via 2-frame stacking by default for cartpole-camera when using newton+newton_renderer. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible. The default for all other physics/renderer backends is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver, and RTX provides temporal data implicitly. **Implementation:** For manager-based envs, a new `stacked_image` term is added to the MDP observations - tasks can opt into frame stacking by adding the `stacked_image` term to their observation cfg and setting `frame_stack` to a value > 1. The cartpole camera presets direct env now implements frame stacking using `CircularBuffer` from `isaaclab.utils.buffers` directly in `_get_observations`. Added new unit tests for the MDP term (mocked + `ObservationManager` E2E) and cartpole integration, and updated the documentation with a note about newton's dependency on temporal data. _Note: This is a task-local re-implementation of the closed [PR #5232](#5232 ## Type of change - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Co-authored-by: Kelly Guo <kellyg@nvidia.com>

github-actions Bot added the isaac-lab Related to Isaac Lab team label Apr 10, 2026

kellyguo11 reviewed Apr 10, 2026

View reviewed changes

fatimaanes self-requested a review April 13, 2026 19:46

jmart-nv added 8 commits May 6, 2026 14:19

OMPE-88188: Revert FXAA default and temporal AA warning

3ea31db

OMPE-88188: Rename newton preset to newton_mjwarp (PR isaac-sim#5488)

e7247ac

OMPE-88188: Changelog updates

a798263

jmart-nv force-pushed the jmart/temporal-aa branch from acd6bfe to a798263 Compare May 6, 2026 20:14

jmart-nv changed the title ~~[DRAFT] OMPE-88188: Default explicit temporal data instead of TAA~~ [DRAFT] OMPE-88188: Enable frame stacking for newton+newton_renderer by default May 6, 2026

OMPE-88188: Fix precommit errors.

7453dce

jmart-nv changed the title ~~[DRAFT] OMPE-88188: Enable frame stacking for newton+newton_renderer by default~~ OMPE-88188: Enable frame stacking for newton+newton_renderer by default May 6, 2026

jmart-nv marked this pull request as ready for review May 6, 2026 21:17

jmart-nv requested review from Mayankm96, matthewtrepte, ossamaAhmed and pascal-roth as code owners May 6, 2026 21:17

greptile-apps Bot reviewed May 6, 2026

View reviewed changes

Comment thread source/isaaclab/isaaclab/sensors/camera/camera_cfg.py

Update source/isaaclab/isaaclab/sensors/camera/camera_cfg.py

1dbb38f

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jmart-nv <jemartinez@nvidia.com>

kellyguo11 reviewed May 8, 2026

View reviewed changes

jmart-nv closed this May 8, 2026

jmart-nv mentioned this pull request May 11, 2026

Add frame stacking support for explicit temporal info #5574

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OMPE-88188: Enable frame stacking for newton+newton_renderer by default#5232

OMPE-88188: Enable frame stacking for newton+newton_renderer by default#5232
jmart-nv wants to merge 10 commits into
isaac-sim:developfrom
jmart-nv:jmart/temporal-aa

jmart-nv commented Apr 10, 2026 •

edited

Loading

Uh oh!

kellyguo11 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented May 6, 2026

Uh oh!

Uh oh!

kellyguo11 May 8, 2026

Uh oh!

kellyguo11 May 8, 2026

Uh oh!

kellyguo11 May 8, 2026

Uh oh!

kellyguo11 May 8, 2026

Uh oh!

ooctipus commented May 8, 2026

Uh oh!

jmart-nv commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return isinstance(node, PhysicsCfg) and type(node).__name__ in ("NewtonCfg", "OvPhysxCfg")


		def _physics_needs_temporal_camera_data(node) -> bool:

Conversation

jmart-nv commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Screenshots

Checklist

Uh oh!

kellyguo11 left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps Bot commented May 6, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

kellyguo11 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

kellyguo11 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

kellyguo11 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

kellyguo11 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

ooctipus commented May 8, 2026

Uh oh!

jmart-nv commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jmart-nv commented Apr 10, 2026 •

edited

Loading