Skip to content

OMPE-88188: Enable frame stacking for newton+newton_renderer by default#5232

Closed
jmart-nv wants to merge 10 commits into
isaac-sim:developfrom
jmart-nv:jmart/temporal-aa
Closed

OMPE-88188: Enable frame stacking for newton+newton_renderer by default#5232
jmart-nv wants to merge 10 commits into
isaac-sim:developfrom
jmart-nv:jmart/temporal-aa

Conversation

@jmart-nv
Copy link
Copy Markdown

@jmart-nv jmart-nv commented Apr 10, 2026

Description

This enables frame stacking for newton+warp by default.

Previously this PR also included changes to RTX's default AA, which have been reverted. This was intended to remove the implicit temporal information introduced by DLSS and TAA. Instead, we now focus on explicitly adding temporal data via frame stacking where needed.

Newton Frame Stacking: Provides explicit temporal data to newton.

RTX uses DLSS anti-aliasing by default, which provides implicit temporal data. The newton_renderer does not provide temporal data. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping, which causes convergence problems when paired with newton_renderer.

This commit provides explicit temporal information via 2-frame stacking by default for camera tasks that use newton+newton_renderer. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible.

The default for all other physics/renderer backends is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver, and RTX provides temporal data implicitly.

Implementation:

CameraCfg now has a frame_stack field that controls a ring buffer of previous frames. These are concatenated to the channel dimension automatically in DirectRLEnv. The new MultiBackendCameraCfg wraps the MultiBackendRendererCfg to provide defaults for the different presets, driven by nested config classes and capability flags. Warnings will be logged if it detects that a physics backend that needs explicit temporal data is paired with a renderer that does not provide it.

Updated existing tasks to use the new MultiBackendCameraCfg and added 65 new unit tests.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

profiling_charts convergence_final frame_comparison_grid

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings (N/A - new warnings are intentional)
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label Apr 10, 2026
Copy link
Copy Markdown
Contributor

@kellyguo11 kellyguo11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

Good two-part PR that addresses a real sim2real gap: (1) disabling temporal AA (DLSS) by default so RL policies don't learn from renderer-specific temporal artifacts, and (2) adding frame stacking for Newton to explicitly provide temporal information that Newton's energy-conserving integrator needs.

The implementation is solid — ring buffer approach for frame stacking is clean, the MultiBackendTiledCameraCfg preset pattern is a nice API improvement over the previous renderer_cfg=MultiBackendRendererCfg() approach, and the test coverage is thorough (12+ new tests covering frame stacking, ring buffer correctness, reset behavior, and AA mode warnings).

A few performance and correctness issues worth addressing below.

Comment thread source/isaaclab/isaaclab/sensors/camera/tiled_camera.py Outdated
Comment thread source/isaaclab/isaaclab/sensors/camera/tiled_camera.py Outdated
Comment thread source/isaaclab/isaaclab/envs/direct_rl_env.py Outdated
Comment thread source/isaaclab_tasks/isaaclab_tasks/utils/sim_launcher.py Outdated
Comment thread source/isaaclab/isaaclab/sim/simulation_cfg.py Outdated
Comment thread source/isaaclab/test/sensors/test_tiled_camera_frame_stack.py Outdated
Comment thread source/isaaclab/test/sensors/test_tiled_camera_frame_stack.py Outdated
Comment thread source/isaaclab_tasks/isaaclab_tasks/utils/presets.py Outdated
@fatimaanes fatimaanes self-requested a review April 13, 2026 19:46
jmart-nv added 8 commits May 6, 2026 14:19
RTX uses DLSS anti-aliasing by default. DLSS uses temporal frame blending that encodes motion information into a single RGB observation, causing camera-based RL to learn from temporal artifacts that don't exist with real cameras in the same way.

This has been fixed by changing the default AA mode to FXAA, which is slower than DLSS, but more accurately simulates a real camera. Also added a runtime warning when DLSS, DLAA or TAA are enabled, as these all introduce temporal artifacts.

Added new unit tests to verify expected behavior. Also tested with full cartpole-camera training run.
In the previous commit, DLSS anti-aliasing was disabled by default since it provides implicit temporal data that does not accurately reflect how real cameras work. However, newton's energy-conserving physics solver requires temporal velocity data in order to compensate for the lack of damping.

This commit provides explicit temporal information via 2-frame stacking by default for newton-based camera tasks. This allows newton to provide the damping it needs to converge at the same rate as physx. This adds 36% GPU memory overhead, but the wall clock overhead is negligible.

The default for physx is still stack size = 1 (disabled) since physx has implicit damping built-in via its TGS solver.

Implementation:

`CameraCfg` now has a `frame_stack` field that controls a ring buffer of previous frames. These are concatenated to the channel dimension and the observation_space is auto-adjusted in `DirectRLEnv`. The new `MultiBackendCameraCfg` subclasses `CameraCfg` and pairs the `MultiBackendRendererCfg` with a `frame_stack` preset that defaults to 2 when `presets=newton` is used.

`PhysicsCfg` gains a `requires_temporal_camera_data` capability flag (default False; True on `NewtonCfg`). When a backend that needs temporal data is paired with a camera whose `frame_stack <= 1`, a runtime warning is emitted from `sim_launcher.py`. This avoids hard-coding Newton-specific checks in the general API.

Frame stacking uses a CPU-side flag to gate the per-step ring buffer init, avoiding a GPU->CPU sync from `.any()` on the steady-state path.

Updated existing tasks (cartpole, shadow_hand, dexsuite kuka_allegro) to use `MultiBackendCameraCfg` and added 12 new tests.
Frame stacking now applies only to Newton+Warp by default. Other physics+renderer backends default to no stacking, but if it's detected that no temporal data is present (eg. using RTX with FXAA instead of DLSS) it will log a warning. Custom frame stack can be specified by overriding `frame_stack` on command line.

Added new unit tests for capability-based defaults, preset parsing, cli override and warnings.
- Added sim.step() call before the first camera update to prime RTX via the visualizer pump path and ensure the first frame is not black.
- Replaced assertion that frame pixels differ (incorrectly testing renderer behavior) with a better assertion that the newest slot in the ring buffer is properly updated.
- Replaced `torch.cat` + `.copy_(ordered)` with a per-slot `narrow().copy_()` loop. Eliminates the per-frame tensor allocation.
- For multi-camera tasks, `direct_rl_env.py` collects every camera-like field with `frame_stack > 1` and raises if values disagree, rather than silently picking the first. Three new unit tests cover the single-camera, matching, and conflicting cases.
- Moved stdlib `import logging` to top of test file.
- Replaced three `sys.argv` try/finally blocks with `mock.patch.object(sys, "argv", ...)` context managers.
@jmart-nv jmart-nv force-pushed the jmart/temporal-aa branch from acd6bfe to a798263 Compare May 6, 2026 20:14
@jmart-nv jmart-nv changed the title [DRAFT] OMPE-88188: Default explicit temporal data instead of TAA [DRAFT] OMPE-88188: Enable frame stacking for newton+newton_renderer by default May 6, 2026
@jmart-nv jmart-nv changed the title [DRAFT] OMPE-88188: Enable frame stacking for newton+newton_renderer by default OMPE-88188: Enable frame stacking for newton+newton_renderer by default May 6, 2026
@jmart-nv jmart-nv marked this pull request as ready for review May 6, 2026 21:17
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR adds an explicit frame-stacking ring buffer to CameraCfg/Camera and wires a two-level preset policy (MultiBackendCameraCfg + FrameStackPolicyCfg) that automatically enables frame_stack=2 for the Newton+Warp backend combination, while emitting a warning when physics requires temporal data and no stacking or temporal-AA renderer is present.

  • Frame-stacking in Camera: _create_buffers pre-allocates _frame_history and _stacked_output once; each step advances the ring buffer using narrow()+copy_() with no per-step allocations, and resets correctly preserve per-env history isolation.
  • Policy infrastructure: MultiBackendCameraCfg carries a sentinel frame_stack=0; launch_simulation calls _apply_frame_stack_policies to resolve the nested FrameStackPolicyCfg tree before scanning for the temporal-data warning, ensuring post-policy state is what the warning reflects.
  • DirectRLEnv obs-space adjustment: A shallow scan of cfg.__dataclass_fields__ auto-multiplies the channel dimension for a 3-tuple observation_space, raising ValueError on conflicting stacks across multiple cameras.

Confidence Score: 4/5

Safe to merge; ring-buffer logic, reset isolation, sentinel handling, and the warning predicate matrix are all correct, with one minor documentation inaccuracy in camera_cfg.py.

The ring-buffer implementation deliberately avoids per-step allocations via narrow()+copy_(), but the docstring still says 'O(frame_stack) allocations' — directly contradicting the inline code comment at the call site. All functional logic is sound: history initialization on reset fills all slots with the first post-reset frame, partial resets isolate the correct env indices, the stacked output is built oldest-to-newest as expected, and the _apply_frame_stack_policies sentinel design cleanly separates user-supplied values from policy-driven defaults.

source/isaaclab/isaaclab/sensors/camera/camera_cfg.py has the documentation inaccuracy. The rest of the changed files look correct.

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/sensors/camera/camera.py Core ring-buffer implementation for frame stacking: correct initialization, reset, and per-step copy logic using narrow() views to avoid allocations
source/isaaclab/isaaclab/sensors/camera/camera_cfg.py Adds frame_stack field with default=1; docstring claims O(frame_stack) allocations per step but implementation uses allocation-free narrow()+copy_()
source/isaaclab/isaaclab/envs/direct_rl_env.py Adds obs-space auto-adjustment for frame stacking with conflict detection; shallow one-level scan is consistent with the 3-tuple single-camera convention
source/isaaclab_tasks/isaaclab_tasks/utils/sim_launcher.py Adds _apply_frame_stack_policies, _resolve_frame_stack_policy, and the warning logic; traversal and sentinel handling look correct
source/isaaclab_tasks/isaaclab_tasks/utils/presets.py Adds MultiBackendCameraCfg with sentinel frame_stack=0 and nested FrameStackPolicyCfg tree; two-level PresetCfg design for AND-conditioned physics×renderer selection is well-structured
source/isaaclab/test/sensors/test_camera_frame_stack.py 835-line test suite covering ring-buffer shape, init, reset, partial reset, renderer capability flags, warning matrix, policy truth table, CLI overrides, and end-to-end propagation
source/isaaclab_newton/isaaclab_newton/renderers/newton_warp_renderer_cfg.py Adds provides_temporal_camera_data override returning False for pure-rasterization Warp renderer
source/isaaclab_physx/isaaclab_physx/renderers/isaac_rtx_renderer_cfg.py Adds provides_temporal_camera_data returning True for DLSS/DLAA/TAA and False for FXAA/Off; correctly falls back to True when antialiasing_mode is None (DLSS default)

Sequence Diagram

sequenceDiagram
    participant CLI as CLI / resolve_task_config
    participant LS as launch_simulation
    participant AFP as _apply_frame_stack_policies
    participant DRLE as DirectRLEnv.__init__
    participant Cam as Camera._create_buffers

    CLI->>LS: "env_cfg (frame_stack=0 sentinel)"
    LS->>AFP: "walk cfg tree, set frame_stack=2 where newton+warp policy fires"
    AFP-->>LS: frame_stack resolved on MultiBackendCameraCfg nodes
    LS->>LS: _scan_config for needs_temporal / provides_temporal / has_unstacked_camera
    alt needs_temporal AND NOT provides_temporal AND has_unstacked_camera
        LS->>LS: logger.warning(frame_stack recommendation)
    end
    LS->>DRLE: env_cfg with frame_stack already resolved
    DRLE->>DRLE: "scan cfg.__dataclass_fields__ for frame_stack > 1"
    DRLE->>DRLE: "auto-adjust observation_space (H,W,C) to (H,W,C*frame_stack)"
    DRLE->>Cam: "Camera(cfg) with frame_stack=2"
    Cam->>Cam: alloc _frame_history shape (stack, N, H, W, C)
    Cam->>Cam: "alloc _stacked_output shape (N, H, W, C*stack)"
    Cam->>Cam: "_data.output = _stacked_output, renderer writes to _single_frame_output"
    loop each step
        Cam->>Cam: renderer writes new frame to _single_frame_output
        Cam->>Cam: history[idx].copy_(single), then narrow+copy_ to _stacked_output
    end
Loading

Reviews (1): Last reviewed commit: "OMPE-88188: Fix precommit errors." | Re-trigger Greptile

Comment thread source/isaaclab/isaaclab/sensors/camera/camera_cfg.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: jmart-nv <jemartinez@nvidia.com>
# observation_space implies a single camera drives the policy obs; we collect
# every camera-like field with ``frame_stack > 1`` and verify they all share
# the same value before adjusting, rather than silently picking the first.
if isinstance(self.cfg.observation_space, (list, tuple)) and len(self.cfg.observation_space) == 3:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should belong in the base class. the stacking mechanism should be on a per-task implementation. I would add this specifically for the cartpole environment

class_type: type[PhysicsManager] | Any = MISSING
"""The physics manager class to use. Must be set by subclasses."""

requires_temporal_camera_data: bool = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this probably also doesn't belong in the physics manager cfg. there's no reason the physics manager should care about camera data

self._frame[env_ids] = 0
# Reset frame-stack history for these envs; next _update_buffers_impl fills all
# history slots with the first post-reset frame so the policy never sees stale frames.
if self._frame_stack_size > 1:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend handling the frame stacking logic in the cartpole environment directly

return isinstance(node, PhysicsCfg) and type(node).__name__ in ("NewtonCfg", "OvPhysxCfg")


def _physics_needs_temporal_camera_data(node) -> bool:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are probably also not necessary if we move the logic into the task class directly?

@ooctipus
Copy link
Copy Markdown
Collaborator

ooctipus commented May 8, 2026

Good Job figuring this out @jmart-nv I 100% agree with the motivation and surprised by your finding. Implementation wise, similar to Kelly's advise, I’d prefer this stay task-local for now. The current PR makes frame stacking a framework-level concern across the camera sensor, base env, physics config, renderer capabilities, presets, and launcher. That feels too broad for behavior currently motivated by specific Newton camera tasks.

In particular, I don’t think DirectRLEnv should auto-adjust observation_space from camera config fields. The task owns the observation layout, so it should explicitly stack frames and declare the resulting observation shape. If this becomes common across tasks, we can promote a shared helper later with clearer boundaries.

For example, for CartpoleCameraEnv, I’d rather see something like a task config field:

camera_frame_stack: int = 1
observation_space = [tiled_camera.height, tiled_camera.width, 3 * camera_frame_stack]

and then handle the history directly in CartpoleCameraEnv._get_observations():

def _stack_camera_obs(self, camera_data: torch.Tensor) -> torch.Tensor:
    frame_stack = self.cfg.camera_frame_stack
    if frame_stack <= 1:
        return camera_data
    if self._camera_frame_history is None:
        self._camera_frame_history = camera_data.unsqueeze(0).repeat(frame_stack, 1, 1, 1, 1)
        self._camera_frame_stack_idx = 0
    self._camera_frame_history[self._camera_frame_stack_idx].copy_(camera_data)
    stacked = torch.cat(
        [
            self._camera_frame_history[(self._camera_frame_stack_idx + 1 + i) % frame_stack]
            for i in range(frame_stack)
        ],
        dim=-1,
    )
    self._camera_frame_stack_idx = (self._camera_frame_stack_idx + 1) % frame_stack
    return stacked

Then _get_observations() can remain explicit:

camera_data = self._tiled_camera.data.output[data_type] / 255.0
camera_data -= torch.mean(camera_data, dim=(1, 2), keepdim=True)
camera_data = self._stack_camera_obs(camera_data)
return {"policy": camera_data.clone()}

Reset handling can also stay task-local by clearing or reinitializing the history for env_ids in _reset_idx(). That keeps the observation contract, memory cost, reset behavior, and Newton-specific policy choice visible in the environment that actually needs it.

I’d also document the Newton/Warp renderer class subtlety directly: unlike RTX modes with temporal AA, the Warp renderer does not inject prior-frame information into the current image. Camera-control tasks that need velocity-like visual cues should therefore add explicit temporal observations, such as task-local frame stacking, rather than relying on renderer-specific artifacts.

Feel free to give my comment to agent and and have it work on next iteration x)

@jmart-nv jmart-nv closed this May 8, 2026
@jmart-nv
Copy link
Copy Markdown
Author

jmart-nv commented May 8, 2026

Closing this PR in favor of per-task frame stacking

kellyguo11 added a commit that referenced this pull request May 16, 2026
# Description

This enables frame stacking for newton+warp by default in the cartpole
camera presets task.

**Newton Frame Stacking:** _Provides explicit temporal data to newton._

RTX uses DLSS anti-aliasing by default, which provides implicit temporal
data. The newton_renderer does not provide temporal data. However,
newton's energy-conserving physics solver requires temporal velocity
data in order to compensate for the lack of damping, which causes
convergence problems when paired with newton_renderer.

This commit provides explicit temporal information via 2-frame stacking
by default for cartpole-camera when using newton+newton_renderer. This
allows newton to provide the damping it needs to converge at the same
rate as physx. This adds 36% GPU memory overhead, but the wall clock
overhead is negligible.

The default for all other physics/renderer backends is still stack size
= 1 (disabled) since physx has implicit damping built-in via its TGS
solver, and RTX provides temporal data implicitly.

**Implementation:**

For manager-based envs, a new `stacked_image` term is added to the MDP
observations - tasks can opt into frame stacking by adding the
`stacked_image` term to their observation cfg and setting `frame_stack`
to a value > 1.

The cartpole camera presets direct env now implements frame stacking
using `CircularBuffer` from `isaaclab.utils.buffers` directly in
`_get_observations`.

Added new unit tests for the MDP term (mocked + `ObservationManager`
E2E) and cartpole integration, and updated the documentation with a note
about newton's dependency on temporal data.

_Note: This is a task-local re-implementation of the closed [PR
#5232](#5232

## Type of change

- Bug fix (non-breaking change which fixes an issue)

## Checklist

- [x] I have read and understood the [contribution
guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html)
- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants