Cohere: update RoPE structure#33408
Conversation
There was a problem hiding this comment.
This is copy/paste from llama
There was a problem hiding this comment.
Aside from the line highlighted with a comment, this is copy/paste from llama
|
@LysandreJik a PR like this one will be open for a few more modern models. Since part of the changes consists of having a global view of the model to update the |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
LysandreJik
left a comment
There was a problem hiding this comment.
This looks clean, nice to reuse the llama code
No need to updat the import structure for now! |
What does this PR do?
This PR propagates the updates to the RoPE structure to
cohere-- the logic for RoPE was abstracted into a separate module forllama3.1(#32135). Using the new structure, a model has access to all RoPE scaling strategies.While touching the modeling code, I've taken the liberty to:
copied fromstatements, which were disabled in previous PRs;✅ all slow tests passing
Note: #31999 was originally open to migrate all modern RoPE models into the upgraded structure. However, working on
cohere, I noticed that there may be important implementation differences in RoPE. As such, I'll be opening multiple PRs, batching similar RoPE implementations together.