Thinker

The trained computer

We want to train a model that does numeric computation such as 1 + 2 = 3, what do we need for computation:

input
reusable computer unit -> repeat transformer block
memory -> concatenated embeddings accessed via cross attention
algorithm, that gives the desired output

in our case computer & algorithm will be merged in the model
memory will be intermediates latent states concatenated

the algorithm we decided to learn are (from easy to difficult) :

copy input to output, with some variations
addition
multiplication
number factorisation

Task 3 in particular will help to test how this method performs in variable complexity. Since the last task highly relies on memory to reduce computation we will observe how the model model will use the given memory.

Checkout :

toy_model.py
on going experiment log

Based on the observed result we could re-use the same approach on Language Modeling Task following the original ideas.

About the model
The model is a cross-attention latent-based transformer (like Perceiver):

layer weight sharing to allow reuseable compute block
hidden latent vector as information passing
cross attention on input
cross attention on past latent (wider information passing)

here's a visual

here's a draft of the initial idea

Project Structure

thinker/
├── core/                       # 🧠 Core architecture and tools
│   ├── models.py               # Main model configurations
│   ├── toy_model.py            # Primary ToyThinker model
│   ├── layers.py               # Basic model layers (SwiGLU, RMSNorm, FlexDecoderLayer, RoPE)
│   └── utils.py                # Core utilities (e.g. CfgNode)
├── data/                       # 🗃️ Datasets and curriculum logic
│   └── numbers.py              # Generative/Curriculum datasets
├── scripts/                    # 🚀 Entrypoint scripts for execution
│   ├── train.py                # Setup for automated (15 min budget) autoresearch training
│   ├── th1nker_runner.py       # Standard runner
│   ├── run_lightning.py        # Lightning-based runner
│   ├── generate_embeddings.py  # Utility scripts
│   └── visualize.py            # Log visualization
├── notebooks/                  # 📓 Rapid prototyping and Colab entrypoint
│   └── Th1nker_runner.ipynb
├── dev_notes/                  # 📝 Development notes, DB structures, past experiments
│   ├── ideas/                  
│   └── experiment.log.md
├── docs/                       # 📚 Documentation
│   └── ToDo.md
├── inspirations/               # 💡 External references (Autoresearch, AdderBoard)
└── program.md                  # 🤖 System instructions for automated research agents

Similar ideas:

Looped Transformers - paper - x_post - code

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
core		core
data		data
dev_notes		dev_notes
docs		docs
inspirations		inspirations
logs		logs
model		model
notebooks		notebooks
scripts		scripts
thesis		thesis
.gitignore		.gitignore
README.md		README.md
program.md		program.md
train_param.txt		train_param.txt
visual-explanation.png		visual-explanation.png
visual-explanation.svg		visual-explanation.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thinker

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Thinker

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages