Train baseline + rotary embeddings #42

Closed

Assignees

Labels

opened

The simplest model, with rotary embeddings. Don't necessarily train to 300B tokens to compare.

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests