The simplest model, with rotary embeddings. Don't necessarily train to 300B tokens to compare.
The simplest model, with rotary embeddings. Don't necessarily train to 300B tokens to compare.