[CI] Fix copies#42487
Conversation
|
[For maintainers] Suggested jobs to run (before merge) run-slow: nanochat |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| self.o_proj = nn.Linear( | ||
| config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias | ||
| ) | ||
| self.rotary_fn = apply_rotary_pos_emb |
There was a problem hiding this comment.
if you are sure this should be here (i mean I understand the sync , but does this self.rotary_fn = apply_rotary_pos_emb makes sense for this model. If so, good. if not, maybe should not be a sync and the modular file should be adjusted
There was a problem hiding this comment.
Afaik, it's due to kernels needing this but let's double check @MekkCyber
There was a problem hiding this comment.
I think we wanted it outside not using self.xxxx
There was a problem hiding this comment.
Looks like it's not possible? Otherwise, we need to update all models. But it's getting late, let's resolve this next week with a fresh mind 😄
ArthurZucker
left a comment
There was a problem hiding this comment.
ty tho, let's move it out!
| self.o_proj = nn.Linear( | ||
| config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias | ||
| ) | ||
| self.rotary_fn = apply_rotary_pos_emb |
There was a problem hiding this comment.
I think we wanted it outside not using self.xxxx
|
If possible |
|
Hi @ArthurZucker , As this comment: curent design for |
|
Right my bad! |
Previous PR merged without syncing with main --> new model --> out of sync