Skip to content

Fix eval batch size, add Dockerfile, improve logging, remove unused code, freeze requirements#1

Merged
itayhubara merged 3 commits intoitayhubara:llama_v2_finetuningfrom
michal2409:llama_v2_finetuning
Feb 26, 2024
Merged

Fix eval batch size, add Dockerfile, improve logging, remove unused code, freeze requirements#1
itayhubara merged 3 commits intoitayhubara:llama_v2_finetuningfrom
michal2409:llama_v2_finetuning

Conversation

@michal2409
Copy link

No description provided.

@itayhubara
Copy link
Owner

itayhubara commented Feb 25, 2024

Ok, most of it looks great. Thank you, it was exactly what I planned to do today (post-submission). I have a few questions before merging:

  1. what is the difference between paged_adm_w and adamw_torch
  2. you changed the default value of Lora alpha to 32 - in your experiments you used 16 right?
  3. It seems like you kept the monkey patch on the training_steps - I thought that after fixing the eval_steps it is unrequited.

@michal2409
Copy link
Author

  1. The paged version is more optimized for memory, but regular adamw works fine
  2. I thought we decided to use alpha=32
    image
  3. Right, I will remove it.

@itayhubara
Copy link
Owner

ok I understand, we can keep alpha as another hyper parameter. I'll merge

@itayhubara itayhubara merged commit a102e34 into itayhubara:llama_v2_finetuning Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants