Recurrent DQN families with a new interface by muupan · Pull Request #436 · chainer/chainerrl

muupan · 2019-04-07T07:02:40Z

~~Merge #431 before this PR.~~

This PR resolves #112

Use the new StatelessRecurrent interface to support recurrent models in DQN variants.
Add examples/ale/train_drqn_ale.py as a solid example of recurrent DQN.
- Remove options for recurrent DQN from other examples for simplicity.

TODO

Evaluate recurrent DQN on flickered Atari
Compare computational efficiency against the old recurrent interface.
Fix all the affected agents that inherits DQN
Support CategoricalDoubleDQN

since it does not make much sense to use this without recurrent models. supporting recurrent models in these examples can be future work.

prabhatnagarajan

Can you add checkboxes in the main ChainerRL repo README for the models that are now supported as recurrent?

examples/ale/train_drqn_ale.py

examples/ale/train_ppo_ale.py

Co-Authored-By: muupan <muupan@gmail.com>

muupan · 2019-05-06T07:48:36Z

I performed experiments to validate recurrent DQN on flickering Pong with 1-frame observations and p=0.5.

DoubleDQN 1-frame flicker: examples/ale/train_drqn_ale.py --flicker --no-frame-stack --env PongNoFrameskip-v4
- not recurrent, batch size is 32
DoubleDQN 1-frame flicker recurrent: examples/ale/train_drqn_ale.py --flicker --no-frame-stack --recurrent --env PongNoFrameskip-v4
- recurrent, each batch consists of 32 subsequences of up-to-10 steps.
DoubleDQN 1-frame flicker recurrent small: examples/ale/train_drqn_ale.py --flicker --no-frame-stack --recurrent --batch-size 4 --episodic-update-len 8 --env PongNoFrameskip-v4
- recurrent, each batch consists of 4 subsequences of up-to-8 steps.

Each configuration is evaluated with 3 trials with 3 different random seeds.

As you can see from the elapsed column, recurrent is ~3x slower than non-recurrent one, while recurrent small is ~2x slower.

steps	episodes	elapsed	mean	median	stdev	max	min	average_q	average_loss	n_updates
250354	274	1364.0595016479492	-20.97142857142857	-21.0	0.1671968272244295	-20.0	-21.0	-0.15472846694227368	0.010731786431286633	50089
500592	558	2847.7919516563416	-21.0	-21.0	0.0	-21.0	-21.0	-0.33754477598757904	0.01057778967586348	112648
750113	849	4324.770362138748	-20.993865030674847	-21.0	0.07832604499879574	-20.0	-21.0	-0.5182035528297387	0.011443975988352832	175029
1000676	1151	5812.36558508873	-21.0	-21.0	0.0	-21.0	-21.0	-0.6665175841253652	0.011932912314109872	237669

steps	episodes	elapsed	mean	median	stdev	max	min	average_q	average_loss	n_updates
250750	271	3796.5660257339478	-20.42281879194631	-20.0	0.6171119313874047	-18.0	-21.0	-0.16072401217893154	0.005658499339007796	50047
500779	532	8020.790464639664	-20.626666666666665	-21.0	0.48530992887983576	-20.0	-21.0	-0.32181127636568474	0.003985276161444072	112554
751002	767	12225.83938908577	-20.053030303030305	-20.0	0.822530019307808	-18.0	-21.0	-0.34344224204186496	0.004697177613578364	175110
1000710	962	16429.62428689003	-18.55056179775281	-19.0	1.544868602496196	-14.0	-21.0	-0.3529464231919613	0.004316390503204711	237537

steps	episodes	elapsed	mean	median	stdev	max	min	average_q	average_loss	n_updates
250325	276	2906.1080510616302	-21.0	-21.0	0.0	-21.0	-21.0	-0.15629060889382612	0.011670349162512314	49941
500477	551	6064.063723325729	-20.911392405063292	-21.0	0.2850800945771471	-20.0	-21.0	-0.3575715294666698	0.007044254700727394	112479
750165	828	9194.020221710205	-20.71641791044776	-21.0	0.5290357279818776	-19.0	-21.0	-0.4687755320280299	0.005378178538859955	174901
1000720	1090	12292.375975370407	-20.017241379310345	-20.0	1.0548666400004822	-16.0	-21.0	-0.5080234859767162	0.004646373723471428	237539

chainerrl/replay_buffer.py

chainerrl/agents/dqn.py

Co-Authored-By: Prabhat Nagarajan <prabhat.nagarajan@gmail.com>

chainerrl/agents/dqn.py

chainerrl/agents/categorical_double_dqn.py

chainerrl/agents/iqn.py

chainerrl/agents/sarsa.py

Co-Authored-By: Prabhat Nagarajan <prabhat.nagarajan@gmail.com>

prabhatnagarajan

Almost done... Can you make these small changes.

tests/agents_tests/test_iqn.py

examples_tests/atari/test_drqn.sh

Co-Authored-By: Prabhat Nagarajan <prabhat.nagarajan@gmail.com>

prabhatnagarajan

Looks good. Please take a look at the comments for minor improvements.

examples/atari/train_drqn_ale.py

prabhatnagarajan · 2019-08-09T11:55:12Z

examples/atari/train_drqn_ale.py

+        rbuf = replay_buffer.EpisodicReplayBuffer(10 ** 6)
+    else:
+        # Q-network without LSTM
+        q_func = chainer.Sequential(


Any reason not to copy this code: https://github.com/chainer/chainerrl/blob/master/examples/atari/train_dqn_ale.py#L50?

Partly because it is consistent with a recurrent version, partly because I think it is easier to understand, thus better as an example.

prabhatnagarajan · 2019-08-09T11:56:19Z

examples/atari/train_drqn_ale.py

+        args.final_exploration_frames,
+        lambda: np.random.randint(n_actions))
+
+    opt = chainer.optimizers.Adam(1e-4, eps=1e-4)


I'm assuming this should be fine. But why did you use Adam?

It is just because Adam seems preferred in literature recently. I don't mean to reproduce any paper in this example.

Co-Authored-By: Prabhat Nagarajan <prabhat.nagarajan@gmail.com>

muupan added 14 commits April 7, 2019 01:33

Merge branch 'recurrent-ppo-squash' into recurrent-dqn

a65be21

Use new recurrent model in DQN

03f1e06

Fix double dqn

19b8051

Add DRQN example

91bc211

Fix dtype error

a85f32a

Use mean of loss to avoid effect from sequence length

24459d8

Stabilize tests

1bdfaa6

Renew recurrent AL

b0a4063

Renew recurrent PAL

ac6bd4d

Renew recurrent DoublePAL

f57e1d9

Add recurrent DPP

78e0059

Make SARSA off-policy, batched, and recurrent

15740ff

Renew recurrent CategoricalDQN

9783ab7

Make IQN recurrent

8ed0587

muupan changed the title ~~[WIP] Recurrent DQN with a new interface~~ [WIP] Recurrent DQN families with a new interface Apr 8, 2019

muupan added 3 commits April 8, 2019 13:37

Fix to support py2

e9efeaf

Renew recurrent ResidualDQN

9f41f5c

Remove --episodic-replay options

c1a1581

since it does not make much sense to use this without recurrent models. supporting recurrent models in these examples can be future work.

muupan added enhancement no-compat labels Apr 15, 2019

prabhatnagarajan suggested changes May 1, 2019

View reviewed changes

examples/ale/train_drqn_ale.py Outdated Show resolved Hide resolved

examples/ale/train_drqn_ale.py Outdated Show resolved Hide resolved

examples/ale/train_ppo_ale.py Outdated Show resolved Hide resolved

prabhatnagarajan and others added 8 commits May 6, 2019 15:41

Update examples/ale/train_drqn_ale.py

b6fdf53

Co-Authored-By: muupan <muupan@gmail.com>

Update examples/ale/train_drqn_ale.py

90f0b5a

Co-Authored-By: muupan <muupan@gmail.com>

Update examples/ale/train_ppo_ale.py

9febe00

Co-Authored-By: muupan <muupan@gmail.com>

Add options

c7a519d

Merge branch 'recurrent-ppo-squash' into recurrent-dqn

162330a

Merge branch 'master' into recurrent-dqn

496ffc6

Use the new recurrent interface for CategoricalDoubleDQN

bb40059

Mark IQN as supporting recurrent

6dc49e9

prabhatnagarajan suggested changes Jul 16, 2019

View reviewed changes

chainerrl/replay_buffer.py Outdated Show resolved Hide resolved

chainerrl/agents/dqn.py Outdated Show resolved Hide resolved

muupan and others added 5 commits July 17, 2019 17:35

Merge branch 'master' into recurrent-dqn

8f89ee4

Update chainerrl/replay_buffer.py

eccee63

Co-Authored-By: Prabhat Nagarajan <prabhat.nagarajan@gmail.com>

Merge branch 'master' into recurrent-dqn

0722695

Add DRQN to README

ea692c7

Add assert message

df482b2

muupan requested a review from prabhatnagarajan July 18, 2019 07:17

prabhatnagarajan suggested changes Jul 23, 2019

View reviewed changes

chainerrl/agents/dqn.py Outdated Show resolved Hide resolved

chainerrl/agents/dqn.py Show resolved Hide resolved

prabhatnagarajan suggested changes Jul 23, 2019

View reviewed changes

chainerrl/agents/categorical_double_dqn.py Show resolved Hide resolved

prabhatnagarajan suggested changes Jul 24, 2019

View reviewed changes

chainerrl/agents/iqn.py Outdated Show resolved Hide resolved

chainerrl/agents/iqn.py Show resolved Hide resolved

chainerrl/agents/sarsa.py Outdated Show resolved Hide resolved

chainerrl/agents/sarsa.py Show resolved Hide resolved

chainerrl/agents/sarsa.py Outdated Show resolved Hide resolved

muupan and others added 8 commits August 1, 2019 01:33

Update chainerrl/agents/sarsa.py

aea32d7

Co-Authored-By: Prabhat Nagarajan <prabhat.nagarajan@gmail.com>

Update chainerrl/agents/sarsa.py

4100079

Co-Authored-By: Prabhat Nagarajan <prabhat.nagarajan@gmail.com>

Restore chainer.using_config('train', False)

c5df80a

Reduce redundancy

2c68749

Use a full path

f0a67bd

Merge two methods into one

65b25cd

Merge branch 'master' into recurrent-dqn

2360686

Fix syntax error with python2

78b5e3c

muupan requested a review from prabhatnagarajan August 1, 2019 01:58

prabhatnagarajan suggested changes Aug 4, 2019

View reviewed changes

tests/agents_tests/test_iqn.py Outdated Show resolved Hide resolved

examples_tests/atari/test_drqn.sh Outdated Show resolved Hide resolved

examples_tests/atari/test_drqn.sh Outdated Show resolved Hide resolved

muupan and others added 3 commits August 9, 2019 17:16

Update examples_tests/atari/test_drqn.sh

4e5d0b9

Co-Authored-By: Prabhat Nagarajan <prabhat.nagarajan@gmail.com>

Remove commented out code

fdca04d

Update examples_tests/atari/test_drqn.sh

22d13df

Co-Authored-By: Prabhat Nagarajan <prabhat.nagarajan@gmail.com>

muupan requested a review from prabhatnagarajan August 9, 2019 10:26

prabhatnagarajan approved these changes Aug 9, 2019

View reviewed changes

muupan and others added 2 commits August 9, 2019 21:07

Update examples/atari/train_drqn_ale.py

467f845

Co-Authored-By: Prabhat Nagarajan <prabhat.nagarajan@gmail.com>

Add a link to DRQN paper

9070c76

muupan merged commit 36aa37c into chainer:master Aug 9, 2019

muupan deleted the recurrent-dqn branch August 9, 2019 13:22

muupan added this to the v0.8 milestone Feb 6, 2020

Conversation

muupan commented Apr 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

prabhatnagarajan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

muupan commented May 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

prabhatnagarajan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

prabhatnagarajan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

prabhatnagarajan Aug 9, 2019

Choose a reason for hiding this comment

Uh oh!

muupan Aug 9, 2019

Choose a reason for hiding this comment

Uh oh!

prabhatnagarajan Aug 9, 2019

Choose a reason for hiding this comment

Uh oh!

muupan Aug 9, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

muupan commented Apr 7, 2019 •

edited

Loading

muupan commented May 6, 2019 •

edited

Loading