Some bugs in model inference


Hi, I noticed a potential parameter order mismatch when calling `reward_fn` in the training logic.

In [`ssrl/brax/training/agents/ssrl/networks.py#L133`](https://github.com/CLeARoboticsLab/ssrl/blob/main/ssrl/brax/training/agents/ssrl/networks.py#L133) the current implementation is:
```python
reward = c.reward_fn(obs, obs_next, jp.mean(us), action)
```

However, the [`reward_fn` definition in go1_go_fast.py`](https://github.com/CLeARoboticsLab/ssrl/blob/main/ssrl/brax/envs/go1_go_fast.py#L798) shows the expected parameter order as:
```python
def reward_fn(obs_next, obs, us, action):
```
The parameters `obs` and `obs_next` should be swapped in the function call.

Additionally, I think it should use the clean obs (obs_next_mean) to compute rewards instead of using the noise version (obs_next), so it should be like this:
**Proposed Fix:**

```python
reward = c.reward_fn(obs_next_mean, obs, jp.mean(us), action)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some bugs in model inference #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Some bugs in model inference #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions