Conversation
Signed-off-by: Sami Jaghouar <sami.jaghouar@gmail.com>
Signed-off-by: Sami Jaghouar <sami.jaghouar@gmail.com>
| ) | ||
|
|
||
| ModelName: TypeAlias = Literal["debugmodel", "150M", "1B", "Qwen32B", "Qwen1.5B", "Qwen7B"] | ||
| ModelType: TypeAlias = LlamaForCausalLM | Qwen2ForCausalLM |
There was a problem hiding this comment.
Hrmm actually would transformers.modeling_utils.PreTrainedModel have worked? Could make it that we dont need to keep adding to this in the future
There was a problem hiding this comment.
Oh but theyre removing the GenerationMixin soon. Hrmm
There was a problem hiding this comment.
Hrmm actually would transformers.modeling_utils.PreTrainedModel have worked? Could make it that we dont need to keep adding to this in the future
hmm but my goal is to be able to control click into the llama and qwen code easily. Moreover I don't think that the PreTrainedModel is precise enough. For example in our apply_fsdp code we relay on the fact that model.model.layers exists. This is true for both LlamaForCausalLM and Qwen2ForCausalLM but probably not for other pretrained model
* fix wandb * add robust eval * add eval to orch * fix nccl ready * deepdive: separate, explicitly named caches for train and online eval (#18) * delete cache deepdive * add 105 ckpt interval * update deepdeive cache * fix eval * fix eval --------- Co-authored-by: sami jaghouar <sami@primeintellect.ai> Co-authored-by: Sebastian Müller <sebastian@primeintellect.ai> Co-authored-by: Mika Senghaas <mail@mikasenghaas.de>
No description provided.