Environment info
transformers version: 4.6.0
- Platform: -
- Python version: 3.8
- PyTorch version (GPU?): 3.7
- Tensorflow version (GPU?): -
- Using GPU in script?: -
- Using distributed or parallel set-up in script?: -
Who can help
@sgugger
Information
Hi
I am observing reloading after checkpoint does not get the same results. I searched and as mentioned here #11323 (comment) , trainer currently does not save the random states to reload them as well, which is important. Could you add these info in self.state and set random states also in the trainer in the resume? that would be great
thanks
Expected behavior
After resume, one should get exact same results as training the models without break.
Environment info
transformersversion: 4.6.0Who can help
@sgugger
Information
Hi
I am observing reloading after checkpoint does not get the same results. I searched and as mentioned here #11323 (comment) , trainer currently does not save the random states to reload them as well, which is important. Could you add these info in self.state and set random states also in the trainer in the resume? that would be great
thanks
Expected behavior
After resume, one should get exact same results as training the models without break.