Skip to content

zazamrykh/CausalMaskExperiments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CausalMaskExperiments

Made with github.com/Dolmachi Trained transformer without/with causal mask. Got the same quality but costs much more time and GPU memory. As it known generative pre-trained transformer-like archictecures are training with causal mask. But here no strong restriction to train without it. Of course it will leads to much more time and memory consumption.

We have modificated batch processing (see get_loss_without_mask method in train.py) for models without mask. Then we have trained 2 models and compared them. We have got results:

          With mask  |  Without mask

Loss (CE): 4.227 | 4.231

Time (min): 20 | 559

GPU usage (mb): 3147 | 22855

And we have not particularly different quality of generation. You can find more about in experiment in .docx file (but it have been written on russian).

About

Made with github.com/Dolmachi. Trained transformer without/with causal mask. Got the same quality but costs much more time and GPU memory.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors