Skip to content

[Bug Report] Non-reproducible training results in vision-based tasks with identical seeds #3505

@twkang43

Description

@twkang43

Describe the bug

When training RL agents in IsaacLab, vision-based environments result in non-deterministic outcomes across multiple runs, even when using a fixed random seed. In contrast, state-based environments exhibit perfect reproducibility under the same conditions.

This issue was confirmed by running five separate tests with identical settings on each of the following three official IsaacLab environments:

  • Isaac-Cartpole-v0 (state-based): Reproducible
  • Isaac-Cartpole-RGB-v0 (vision-based): Not reproducible
  • Isaac-Cartpole-RGB-ResNet18-v0 (vision-based): Not reproducible

The non-determinism appears to be introduced by the vision processing pipeline, as it is the key difference between the reproducible and non-reproducible environments. However, as I have not investigated this in-depth, further analysis is needed to identify the root cause.

The provided WandB logs show the reward curves from several training executions. As illustrated, the training curves for the vision-based environments show significant divergence. This non-reproducibility occurs even though all experimental settings, including the random seed, were kept identical for each run.
(state-{i}: Isaac-Cartpole-v0, rgb-{i}: Isaac-Cartpole-RGB-v0, resnet-{i}: Isaac-Cartpole-RGB-ResNet18-v0)

Image Image Image

Steps to reproduce

  1. Run the state-based environment five times with a fixed seed:
    python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-v0 --headless --seed 42 --max_iteration 100
    
  2. Run the vision-based environment five times with the same seed:
    python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-RGB-v0 --enable_cameras --headless --seed 42 --max_iteration 100
    
  3. Run the vision feature from ResNet18-based environment five times with the same seed:
    python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-RGB-ResNet18-v0 --enable_cameras --headless --seed 42 --max_iteration 100
    

All hyperparameters and environment settings not specified in the CLI arguments default to the values defined in the code.

System Info

  • Commit: f20d74c
  • Isaac Sim Version: 4.5
  • OS: Ubuntu 22.04
  • GPU: RTX A6000
  • CUDA: 12.9
  • GPU Driver: 575.64.03

Additional context

A note on the logs: For some runs, WandB logging halted before the experiment's completion, despite all runs being executed for an identical number of steps. This does not impact the overall analysis. For the reproducible environment (Isaac-Cartpole-v0), training curves were perfectly identical until the earliest halt. For the non-reproducible environments, the curves had already diverged long before any logging stopped.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have checked that the issue is not in running Isaac Sim itself and is related to the repo

Acceptance Criteria

  • Verify whether the vision feature pipeline introduces non-determinism.
  • Identify fixes or configurations to achieve reproducibility across both state-based and vision-based environments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingisaac-simRelated to Isaac Sim team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions