Skip to content

Add type hint model#18

Merged
samsja merged 2 commits intomainfrom
add-type-hint-model
Mar 1, 2025
Merged

Add type hint model#18
samsja merged 2 commits intomainfrom
add-type-hint-model

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented Feb 28, 2025

No description provided.

Signed-off-by: Sami Jaghouar <sami.jaghouar@gmail.com>
Signed-off-by: Sami Jaghouar <sami.jaghouar@gmail.com>
@samsja samsja requested review from Jackmin801 and apaz-cli and removed request for apaz-cli February 28, 2025 23:51
@samsja samsja merged commit f44d483 into main Mar 1, 2025
Comment thread src/zeroband/models.py
)

ModelName: TypeAlias = Literal["debugmodel", "150M", "1B", "Qwen32B", "Qwen1.5B", "Qwen7B"]
ModelType: TypeAlias = LlamaForCausalLM | Qwen2ForCausalLM
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm actually would transformers.modeling_utils.PreTrainedModel have worked? Could make it that we dont need to keep adding to this in the future

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh but theyre removing the GenerationMixin soon. Hrmm

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm actually would transformers.modeling_utils.PreTrainedModel have worked? Could make it that we dont need to keep adding to this in the future

hmm but my goal is to be able to control click into the llama and qwen code easily. Moreover I don't think that the PreTrainedModel is precise enough. For example in our apply_fsdp code we relay on the fact that model.model.layers exists. This is true for both LlamaForCausalLM and Qwen2ForCausalLM but probably not for other pretrained model

samsja pushed a commit that referenced this pull request Nov 12, 2025
samsja pushed a commit that referenced this pull request Dec 4, 2025
samsja added a commit that referenced this pull request Mar 30, 2026
* fix wandb

* add robust eval

* add eval to orch

* fix nccl ready

* deepdive: separate, explicitly named caches for train and online eval (#18)

* delete cache deepdive

* add 105 ckpt interval

* update deepdeive cache

* fix eval

* fix eval

---------

Co-authored-by: sami jaghouar <sami@primeintellect.ai>
Co-authored-by: Sebastian Müller <sebastian@primeintellect.ai>
Co-authored-by: Mika Senghaas <mail@mikasenghaas.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants