Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Our code is built on top of the [batch_rl](https://github.com/google-research/ba

To run experiments in the paper, you will have to specify the size of an individual replay buffer for the purpose of being able to use 1% and 10% data. This is specified in line 53 in `batch_rl/fixed_replay/replay_memory/fixed_replay_memory.py`. For 1%, set `args[2]=1000` and for 10% set `args[2] = 10000`. Depending upon the availability of RAM, you may be able to raise the value of `num_buffers` from 10 to 50 (we were able to do this for 1% datasets) and then change this value in: `self._load_replay_buffers(num_buffers=<>)`.

Now, to run CQL, use the follwing command:
Now, to run CQL, use the following command:

```
python -um batch_rl.fixed_replay.train \
Expand All @@ -43,7 +43,7 @@ python examples/cql_mujoco_new.py --env=<d4rl-mujoco-env-with-version e.g. hoppe
--min_q_weight=(5.0 or 10.0) --gpu=<gpu-id> --min_q_version=3
```

In terms of parameters, we have found `min_q_weight=5.0` or `min_q_weight=10.0` along with `policy_lr=1e-4` or `policy_lr=3e-4` to work reasonably fine for the Gym MuJoCo tasks. These parameters are slightly different from the paper (which will be updated soon) due to differences in the D4RL datasets. For sample performance numbers (final numbers to be updated soon), hopper-medium acheives ~3000 return, and hopper-medium-exprt obtains ~1300 return at the end of 500k gradient steps. To run `CQL(\rho)` [i.e. without the importance sampling], set `min_q_version=2`.
In terms of parameters, we have found `min_q_weight=5.0` or `min_q_weight=10.0` along with `policy_lr=1e-4` or `policy_lr=3e-4` to work reasonably fine for the Gym MuJoCo tasks. These parameters are slightly different from the paper (which will be updated soon) due to differences in the D4RL datasets. For sample performance numbers (final numbers to be updated soon), hopper-medium achieves ~3000 return, and hopper-medium-exprt obtains ~1300 return at the end of 500k gradient steps. To run `CQL(\rho)` [i.e. without the importance sampling], set `min_q_version=2`.

For Ant-Maze tasks, please run:
```
Expand Down