add support for distributed Offline Eval by alexnikulkov · Pull Request #708 · facebookresearch/ReAgent

alexnikulkov · 2023-01-10T18:33:58Z

Summary:
Adding support for distributed Offline Eval. This requires maintaining local buffers in each trainer instance and syncing them across all trainers periodically. The sync happens under one of 2 conditions:

When the "critical" weight of data has been consumed (will be set approximately equal to the size of 1-hr partition)
At the end of the training epoch (if data has been consumed since last sync)

Also, updating the FREE pipeline to remove the restriction on number of nodes for Offline Eval runs

Differential Revision: D42407669

facebook-github-bot · 2023-01-10T18:34:59Z

This pull request was exported from Phabricator. Differential Revision: D42407669

Summary: Pull Request resolved: facebookresearch#708 Adding support for distributed Offline Eval. This requires maintaining local buffers in each trainer instance and syncing them across all trainers periodically. The sync happens under one of 2 conditions: 1. When the "critical" weight of data has been consumed (will be set approximately equal to the size of 1-hr partition) 2. At the end of the training epoch (if data has been consumed since last sync) Also, updating the FREE pipeline to remove the restriction on number of nodes for Offline Eval runs Differential Revision: D42407669 fbshipit-source-id: b48ce0fee5f3b8155cb0189e51988986c169d08f

facebook-github-bot · 2023-01-10T19:53:06Z

This pull request was exported from Phabricator. Differential Revision: D42407669

Summary: Pull Request resolved: facebookresearch#708 Adding support for distributed Offline Eval. This requires maintaining local buffers in each trainer instance and syncing them across all trainers periodically. The sync happens under one of 2 conditions: 1. When the "critical" weight of data has been consumed (will be set approximately equal to the size of 1-hr partition) 2. At the end of the training epoch (if data has been consumed since last sync) Also, updating the FREE pipeline to remove the restriction on number of nodes for Offline Eval runs Differential Revision: D42407669 fbshipit-source-id: 634c94a594bedbd98d175d0c41371a717bab0306

facebook-github-bot · 2023-01-10T23:45:39Z

This pull request was exported from Phabricator. Differential Revision: D42407669

codecov-commenter · 2023-01-11T00:21:29Z

Codecov Report

Base: 87.72% // Head: 87.73% // Increases project coverage by +0.01% 🎉

Coverage data is based on head (5aac164) compared to base (517a67f).
Patch coverage: 92.45% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #708      +/-   ##
==========================================
+ Coverage   87.72%   87.73%   +0.01%     
==========================================
  Files         373      373              
  Lines       24042    24078      +36     
  Branches       44       44              
==========================================
+ Hits        21091    21125      +34     
- Misses       2925     2927       +2     
  Partials       26       26

Impacted Files	Coverage Δ
reagent/training/cb/base_trainer.py	`83.33% <60.00%> (-1.12%)`	⬇️
reagent/evaluation/cb/base_evaluator.py	`92.98% <93.75%> (-0.50%)`	⬇️
reagent/evaluation/cb/policy_evaluator.py	`97.22% <95.00%> (+1.38%)`	⬆️
reagent/test/evaluation/cb/test_integration.py	`100.00% <100.00%> (ø)`
...eagent/test/evaluation/cb/test_policy_evaluator.py	`97.61% <100.00%> (+0.32%)`	⬆️
reagent/gym/tests/test_gym.py	`95.93% <0.00%> (-0.82%)`	⬇️
reagent/core/utils.py	`87.23% <0.00%> (+2.12%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

facebook-github-bot · 2023-01-13T19:21:21Z

This pull request has been merged in 89519d7.

Summary: Pull Request resolved: #708 Adding support for distributed Offline Eval. This requires maintaining local buffers in each trainer instance and syncing them across all trainers periodically. The sync happens under one of 2 conditions: 1. When the "critical" weight of data has been consumed (will be set approximately equal to the size of 1-hr partition) 2. At the end of the training epoch (if data has been consumed since last sync) Also, updating the FREE pipeline to remove the restriction on number of nodes for Offline Eval runs Differential Revision: D42407669 fbshipit-source-id: ce436b42b1bb01f3688c6f1f80c52a3d66a47b22

facebook-github-bot added cla signed fb-exported labels Jan 10, 2023

alexnikulkov force-pushed the export-D42407669 branch from 22cd3f5 to 3b48787 Compare January 10, 2023 19:53

alexnikulkov force-pushed the export-D42407669 branch from 3b48787 to 5aac164 Compare January 10, 2023 23:45

facebook-github-bot closed this in 89519d7 Jan 13, 2023

facebook-github-bot added the Merged label Jan 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for distributed Offline Eval#708

add support for distributed Offline Eval#708
alexnikulkov wants to merge 1 commit intofacebookresearch:mainfrom
alexnikulkov:export-D42407669

alexnikulkov commented Jan 10, 2023

Uh oh!

facebook-github-bot commented Jan 10, 2023

Uh oh!

facebook-github-bot commented Jan 10, 2023

Uh oh!

facebook-github-bot commented Jan 10, 2023

Uh oh!

codecov-commenter commented Jan 11, 2023

Uh oh!

facebook-github-bot commented Jan 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alexnikulkov commented Jan 10, 2023

Uh oh!

facebook-github-bot commented Jan 10, 2023

Uh oh!

facebook-github-bot commented Jan 10, 2023

Uh oh!

facebook-github-bot commented Jan 10, 2023

Uh oh!

codecov-commenter commented Jan 11, 2023

Codecov Report

Uh oh!

facebook-github-bot commented Jan 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants