CausalMaskExperiments

Made with github.com/Dolmachi Trained transformer without/with causal mask. Got the same quality but costs much more time and GPU memory. As it known generative pre-trained transformer-like archictecures are training with causal mask. But here no strong restriction to train without it. Of course it will leads to much more time and memory consumption.

We have modificated batch processing (see get_loss_without_mask method in train.py) for models without mask. Then we have trained 2 models and compared them. We have got results:

          With mask  |  Without mask

Loss (CE): 4.227 | 4.231

Time (min): 20 | 559

GPU usage (mb): 3147 | 22855

And we have not particularly different quality of generation. You can find more about in experiment in .docx file (but it have been written on russian).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
notebooks		notebooks
runs		runs
Causal Mask.docx		Causal Mask.docx
README.md		README.md
datasets.py		datasets.py
generation.py		generation.py
main.py		main.py
network.py		network.py
params.py		params.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CausalMaskExperiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CausalMaskExperiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages