From 06ad3a6ecb30e8e75284601c39c2cda0d40b2820 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 4 May 2021 13:14:30 -0700 Subject: [PATCH 1/6] document resume randomness --- docs/source/main_classes/trainer.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst index b0401750f159..70d29efcb882 100644 --- a/docs/source/main_classes/trainer.rst +++ b/docs/source/main_classes/trainer.rst @@ -119,6 +119,19 @@ TFTrainingArguments :members: +Randomness +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When resuming from a checkpoint generated by :class:`~transformers.Trainer` all efforts are made to restore the +`python`, `numpy` and `pytorch` RNG states to the same states as they were at the moment of saving that checkpoint, +which should make the non-stop training as close as possible to "stop and resume" style of training. + +However, due to various default non-deterministic pytorch settings this might not fully work. If you want full +determinism please refer to `this document `__. As explained in the document, that +some of those settings that make things determinstic (.e.g., `torch.backends.cudnn.deterministic`) may slow things +down, therefore you can enable those if you need to. + + Trainer Integrations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 3ac32b726fd0fb4731daced5b823e2d7a82c88a9 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 4 May 2021 13:18:50 -0700 Subject: [PATCH 2/6] fix link --- docs/source/main_classes/trainer.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst index 70d29efcb882..a6008e42cace 100644 --- a/docs/source/main_classes/trainer.rst +++ b/docs/source/main_classes/trainer.rst @@ -127,9 +127,10 @@ When resuming from a checkpoint generated by :class:`~transformers.Trainer` all which should make the non-stop training as close as possible to "stop and resume" style of training. However, due to various default non-deterministic pytorch settings this might not fully work. If you want full -determinism please refer to `this document `__. As explained in the document, that -some of those settings that make things determinstic (.e.g., `torch.backends.cudnn.deterministic`) may slow things -down, therefore you can enable those if you need to. +determinism please refer to `Controlling sources of randomness +`__. As explained in the document, that some of those settings +that make things determinstic (.e.g., `torch.backends.cudnn.deterministic`) may slow things down, therefore you can +enable those if you need to. Trainer Integrations From b62c99654d49f23b0c799b7ee58d3e351a94e569 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 4 May 2021 13:20:25 -0700 Subject: [PATCH 3/6] reword --- docs/source/main_classes/trainer.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst index a6008e42cace..dee2bc1c67b9 100644 --- a/docs/source/main_classes/trainer.rst +++ b/docs/source/main_classes/trainer.rst @@ -129,8 +129,8 @@ which should make the non-stop training as close as possible to "stop and resume However, due to various default non-deterministic pytorch settings this might not fully work. If you want full determinism please refer to `Controlling sources of randomness `__. As explained in the document, that some of those settings -that make things determinstic (.e.g., `torch.backends.cudnn.deterministic`) may slow things down, therefore you can -enable those if you need to. +that make things determinstic (.e.g., `torch.backends.cudnn.deterministic`) may slow things down, therefore this can't +be done by default, but you can enable those yourself if needed. Trainer Integrations From 9d930feb69c4391216745a7d3c5adee61a4dd300 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 4 May 2021 13:20:49 -0700 Subject: [PATCH 4/6] fix --- docs/source/main_classes/trainer.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst index dee2bc1c67b9..21a143bdc3cd 100644 --- a/docs/source/main_classes/trainer.rst +++ b/docs/source/main_classes/trainer.rst @@ -129,7 +129,7 @@ which should make the non-stop training as close as possible to "stop and resume However, due to various default non-deterministic pytorch settings this might not fully work. If you want full determinism please refer to `Controlling sources of randomness `__. As explained in the document, that some of those settings -that make things determinstic (.e.g., `torch.backends.cudnn.deterministic`) may slow things down, therefore this can't +that make things determinstic (.e.g., ``torch.backends.cudnn.deterministic``) may slow things down, therefore this can't be done by default, but you can enable those yourself if needed. From b3965f672230bc4ebef293c096943ef4cfcf456d Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 4 May 2021 13:21:52 -0700 Subject: [PATCH 5/6] reword --- docs/source/main_classes/trainer.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst index 21a143bdc3cd..fd404e2a3470 100644 --- a/docs/source/main_classes/trainer.rst +++ b/docs/source/main_classes/trainer.rst @@ -124,7 +124,7 @@ Randomness When resuming from a checkpoint generated by :class:`~transformers.Trainer` all efforts are made to restore the `python`, `numpy` and `pytorch` RNG states to the same states as they were at the moment of saving that checkpoint, -which should make the non-stop training as close as possible to "stop and resume" style of training. +which should make the "stop and resume" style of training as close as possible to non-stop training. However, due to various default non-deterministic pytorch settings this might not fully work. If you want full determinism please refer to `Controlling sources of randomness From 919a5806ad3357e05353ee3237ac67b4e94fff9a Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Tue, 4 May 2021 13:33:00 -0700 Subject: [PATCH 6/6] style --- docs/source/main_classes/trainer.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst index fd404e2a3470..9fc88a658a33 100644 --- a/docs/source/main_classes/trainer.rst +++ b/docs/source/main_classes/trainer.rst @@ -129,8 +129,8 @@ which should make the "stop and resume" style of training as close as possible t However, due to various default non-deterministic pytorch settings this might not fully work. If you want full determinism please refer to `Controlling sources of randomness `__. As explained in the document, that some of those settings -that make things determinstic (.e.g., ``torch.backends.cudnn.deterministic``) may slow things down, therefore this can't -be done by default, but you can enable those yourself if needed. +that make things determinstic (.e.g., ``torch.backends.cudnn.deterministic``) may slow things down, therefore this +can't be done by default, but you can enable those yourself if needed. Trainer Integrations