Skip to content

[fix] oss dict load fix#383

Merged
blefaudeux merged 2 commits intomasterfrom
oss_dict_load_fix
Feb 12, 2021
Merged

[fix] oss dict load fix#383
blefaudeux merged 2 commits intomasterfrom
oss_dict_load_fix

Conversation

@blefaudeux
Copy link
Copy Markdown
Contributor

@blefaudeux blefaudeux commented Feb 12, 2021

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  • Did you read the contributor guideline?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fix for an issue that Pyspeech was seeing, it looks like I broke the logic in the last updates of #310
The unit test did not catch that because the model was symmetrical enough, tryign to find a better test
The bug detection and fix suggestion is from Weiyi Zheng @zhengwy888

Fixes #380

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 12, 2021
@blefaudeux blefaudeux requested review from anj-s, joshim5 and min-xu-ai and removed request for min-xu-ai February 12, 2021 16:40
Comment thread fairscale/optim/oss.py
if self.param_to_rank[param] != self.rank:
state_dict["state"][key] = None

if key in self.index_to_param:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test was definitely wrong, I should have seen that..

Comment thread fairscale/optim/oss.py

# NOTE: PyTorch 1.5 does not index linearly but with the id(params) at saving time
# we work around that here by using the fact that the params are ordered as in the param_groups
pytorch15_index_redirect = {k: i for i, k in enumerate(state_dict["state"].keys())}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is useless post pytorch 1.5

@blefaudeux
Copy link
Copy Markdown
Contributor Author

context: Planning for a follow up PR to find a unit test which exposes the previous bug (guess is that the model used was too symmetrical), but trying to fix master ASAP in the meantime

Copy link
Copy Markdown
Contributor

@min-xu-ai min-xu-ai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@blefaudeux blefaudeux merged commit 8be9d93 into master Feb 12, 2021
blefaudeux added a commit that referenced this pull request Feb 12, 2021
blefaudeux added a commit that referenced this pull request Feb 12, 2021
facebook-github-bot pushed a commit to facebookresearch/fairseq that referenced this pull request Feb 12, 2021
Summary:
OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant.

This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383

Reviewed By: myleott

Differential Revision: D26382917

fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b
harkash pushed a commit to harkash/fairseq that referenced this pull request Feb 23, 2021
Summary:
OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant.

This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383

Reviewed By: myleott

Differential Revision: D26382917

fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b
jinyiyang-jhu pushed a commit to jinyiyang-jhu/fairseq-jyang that referenced this pull request Feb 26, 2021
Summary:
OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant.

This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383

Reviewed By: myleott

Differential Revision: D26382917

fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b
caltia pushed a commit to caltia/fairseq that referenced this pull request Jul 8, 2025
Summary:
OSS removed the 'partition' key in their state dict to accommodate for changing partition size. This requires an update on the fairseq side to not look into the parameter partition, just broadcast everything, and let the optimizer on each rank decides which parameters are relevant.

This diff also needs D26419095 to function completely, and blefaudeux has made fixes upstream in facebookresearch/fairscale#383

Reviewed By: myleott

Differential Revision: D26382917

fbshipit-source-id: 95af1022be59e88814748acaee36a1a350f7dc5b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[OSS] When loading a state, the insertion order for the params may not match the params_groups

3 participants