Skip to content

Train baseline + rotary embeddings #42

@TevenLeScao

Description

@TevenLeScao

The simplest model, with rotary embeddings. Don't necessarily train to 300B tokens to compare.

Metadata

Metadata

Labels

arch&scaleArchitecture and Scaling Modeling Group

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions