Skip to content

Add PPOTrainer Unit Test V2#526

Closed
igfox wants to merge 1 commit intofacebookresearch:masterfrom
igfox:export-D30342897
Closed

Add PPOTrainer Unit Test V2#526
igfox wants to merge 1 commit intofacebookresearch:masterfrom
igfox:export-D30342897

Conversation

@igfox
Copy link
Contributor

@igfox igfox commented Aug 16, 2021

Summary:
Adds dedicated unit test for PPO Trainer, additionally:

  • Fixes a bug with fully connected value net
  • Fixes some bugs in PPO training around using value net
  • Adds possible_action_mask to DuelingQNetwork

Note: a continuation of D30114686 (8d00eb1), which I reverted after it caused some CircleCI failures

Differential Revision: D30342897

@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D30342897

igfox added a commit to igfox/ReAgent that referenced this pull request Aug 16, 2021
Summary:
Pull Request resolved: facebookresearch#526

Adds dedicated unit test for PPO Trainer, additionally:
- Fixes a bug with fully connected value net
- Fixes some bugs in PPO training around using value net
- Adds possible_action_mask to DuelingQNetwork

Note: a continuation of D30114686 (facebookresearch@8d00eb1), which I reverted after it caused some CircleCI failures

Reviewed By: czxttkl

Differential Revision: D30342897

fbshipit-source-id: fbc78b7b023c1d5bb61bf7fc782c45833f4ca071
@igfox igfox force-pushed the export-D30342897 branch from 1b43149 to 359b7fc Compare August 16, 2021 21:24
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D30342897

Summary:
Pull Request resolved: facebookresearch#526

Adds dedicated unit test for PPO Trainer, additionally:
- Fixes a bug with fully connected value net
- Fixes some bugs in PPO training around using value net
- Adds possible_action_mask to DuelingQNetwork

Note: a continuation of D30114686 (facebookresearch@8d00eb1), which I reverted after it caused some CircleCI failures

Reviewed By: czxttkl

Differential Revision: D30342897

fbshipit-source-id: ea396ea54bc239a4b7aee3d32d91c9a65106e994
@igfox igfox force-pushed the export-D30342897 branch from 359b7fc to b15add4 Compare August 17, 2021 15:23
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D30342897

@codecov-commenter
Copy link

codecov-commenter commented Aug 17, 2021

Codecov Report

Merging #526 (b15add4) into master (39ea5bd) will increase coverage by 0.20%.
The diff coverage is 98.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #526      +/-   ##
==========================================
+ Coverage   84.90%   85.11%   +0.20%     
==========================================
  Files         330      331       +1     
  Lines       19315    19438     +123     
  Branches       44       44              
==========================================
+ Hits        16399    16544     +145     
+ Misses       2890     2868      -22     
  Partials       26       26              
Impacted Files Coverage Δ
reagent/core/types.py 87.26% <75.00%> (+0.79%) ⬆️
reagent/models/dueling_q_network.py 96.15% <87.50%> (-0.79%) ⬇️
reagent/model_managers/policy_gradient/ppo.py 88.52% <100.00%> (-0.37%) ⬇️
reagent/models/dqn.py 82.35% <100.00%> (-7.31%) ⬇️
reagent/models/fully_connected_network.py 91.17% <100.00%> (+3.42%) ⬆️
reagent/net_builder/value/fully_connected.py 100.00% <100.00%> (ø)
reagent/test/net_builder/test_value_net_builder.py 100.00% <100.00%> (ø)
reagent/test/training/test_ppo.py 100.00% <100.00%> (ø)
reagent/training/ppo_trainer.py 99.07% <100.00%> (+14.38%) ⬆️
reagent/training/sac_trainer.py 80.23% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 39ea5bd...b15add4. Read the comment docs.

@facebook-github-bot
Copy link

This pull request has been merged in 9b25610.

xuruiyang pushed a commit that referenced this pull request Sep 20, 2025
Summary:
Pull Request resolved: #526

Adds dedicated unit test for PPO Trainer, additionally:
- Fixes a bug with fully connected value net
- Fixes some bugs in PPO training around using value net
- Adds possible_action_mask to DuelingQNetwork

Note: a continuation of D30114686 (8d00eb1), which I reverted after it caused some CircleCI failures

Reviewed By: czxttkl

Differential Revision: D30342897

fbshipit-source-id: 9be5e86d234619e97e476e46556a4dee07e3b734
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants