diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst index b0401750f159..9fc88a658a33 100644 --- a/docs/source/main_classes/trainer.rst +++ b/docs/source/main_classes/trainer.rst @@ -119,6 +119,20 @@ TFTrainingArguments :members: +Randomness +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When resuming from a checkpoint generated by :class:`~transformers.Trainer` all efforts are made to restore the +`python`, `numpy` and `pytorch` RNG states to the same states as they were at the moment of saving that checkpoint, +which should make the "stop and resume" style of training as close as possible to non-stop training. + +However, due to various default non-deterministic pytorch settings this might not fully work. If you want full +determinism please refer to `Controlling sources of randomness +`__. As explained in the document, that some of those settings +that make things determinstic (.e.g., ``torch.backends.cudnn.deterministic``) may slow things down, therefore this +can't be done by default, but you can enable those yourself if needed. + + Trainer Integrations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~