[Bug Report] Non-reproducible training results in vision-based tasks with identical seeds

### Describe the bug

When training RL agents in IsaacLab, vision-based environments result in non-deterministic outcomes across multiple runs, even when using a fixed random seed. In contrast, state-based environments exhibit perfect reproducibility under the same conditions.

This issue was confirmed by running five separate tests with identical settings on each of the following three official IsaacLab environments:
- ```Isaac-Cartpole-v0``` (state-based): **Reproducible**
- ```Isaac-Cartpole-RGB-v0``` (vision-based): **Not reproducible**
- ```Isaac-Cartpole-RGB-ResNet18-v0``` (vision-based): **Not reproducible**

The non-determinism appears to be introduced by the vision processing pipeline, as it is the key difference between the reproducible and non-reproducible environments. However, as I have not investigated this in-depth, further analysis is needed to identify the root cause.

The provided WandB logs show the reward curves from several training executions. As illustrated, the training curves for the vision-based environments show significant divergence. This non-reproducibility occurs even though all experimental settings, including the random seed, were kept identical for each run.
(```state-{i}: Isaac-Cartpole-v0```, ```rgb-{i}: Isaac-Cartpole-RGB-v0```, ```resnet-{i}: Isaac-Cartpole-RGB-ResNet18-v0```)

<img width="1420" height="318" alt="Image" src="https://github.com/user-attachments/assets/31e47f67-69e9-41de-b5f4-e70b6231a1ef" />

<img width="1420" height="318" alt="Image" src="https://github.com/user-attachments/assets/ca81e76c-9ff9-464e-b69c-6f7f330a5168" />

<img width="1420" height="318" alt="Image" src="https://github.com/user-attachments/assets/524e6bf6-3298-4637-b3f2-58d53a470232" />

### Steps to reproduce

1. Run the state-based environment five times with a fixed seed:
    ```
    python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-v0 --headless --seed 42 --max_iteration 100
    ```
2. Run the vision-based environment five times with the same seed:
    ```
    python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-RGB-v0 --enable_cameras --headless --seed 42 --max_iteration 100
    ```
3. Run the vision feature from ResNet18-based environment five times with the same seed:
    ```
    python scripts/reinforcement_learning/rl_games/train.py --task Isaac-Cartpole-RGB-ResNet18-v0 --enable_cameras --headless --seed 42 --max_iteration 100
    ```

All hyperparameters and environment settings not specified in the CLI arguments default to the values defined in the code.

### System Info

- Commit: f20d74c59d3e20fc822c4e4c5bf8535a48c5aa0b
- Isaac Sim Version: 4.5
- OS: Ubuntu 22.04
- GPU: RTX A6000
- CUDA: 12.9
- GPU Driver: 575.64.03

### Additional context

**A note on the logs**: For some runs, WandB logging halted before the experiment's completion, despite all runs being executed for an identical number of steps. This does not impact the overall analysis. For the reproducible environment (```Isaac-Cartpole-v0```), training curves were perfectly identical until the earliest halt. For the non-reproducible environments, the curves had already diverged long before any logging stopped.

### Checklist

- [x] I have checked that there is no similar issue in the repo (**required**)
- [x] I have checked that the issue is not in running Isaac Sim itself and is related to the repo

### Acceptance Criteria

- [ ] Verify whether the vision feature pipeline introduces non-determinism.
- [ ] Identify fixes or configurations to achieve reproducibility across both state-based and vision-based environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Non-reproducible training results in vision-based tasks with identical seeds #3505

Describe the bug

Steps to reproduce

System Info

Additional context

Checklist

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug Report] Non-reproducible training results in vision-based tasks with identical seeds #3505

Description

Describe the bug

Steps to reproduce

System Info

Additional context

Checklist

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions