Learning by back-propagating

A replication and interpretability study of the paper: 'Learning representations by back-propagating errors' (1986, Rumelhart, Hinton, and Williams).

Structure

This repo replicates the neural network model, the training on two tasks and the interpretability work of the original paper. It contains:

datasets.py: Contains the datasets for the mirror symmetry vector task and the family trees task.
model.py: Implements the simple neural network architecture using NumPy.
train.py: Trains a model on either of 2 tasks.
interp.py Interprets trained models by looking at weights and activations.

Training a simple neural network

Since the paper used very small networks and the data is very imbalanced (for the vector symmetry task only 8 positive examples and 56 negative), it can easily happen that these models get stuck in local minima.

There are two tasks specified in the paper:

uv run train.py --task vector_sym

trains a 3 layer neural network on a mirror symmetry vector task. The task requires the network to detect symmetry in a 6 digit binary vector, where the second part of the vector is mirrored/flipped. For example 100100 is asymmetric but 100001 is symmetric.

uv run train.py --task family_trees

trains a 5 layer neural network on the family trees relationship task. The task requires the network to learn relationships between persons in a family tree. It must learnt to predict the second person in a triplet person1/relationship/person2. It is trained on 2 isomorphic family trees (Italian and English).

Interpretability

If you are interested in finding out what the model has actually learned run uv run interp.py --task vector_sym or uv run interp.py --task family_trees.

Insights

The model trained on the vector task, learns to detect symmetry by using anti-symmetric weights (e.g., -3 and 3) that cancel out only when the input is perfectly mirrored. It also learns a doubling-weight pattern (e.g., 3, 6, 12) to ensure no asymmetric input can trick the hidden units. Since it is a simple task and a small network it is easy to completly reverse engineer the entire model.
The model trained on the family trees is a bit more complex and some of the weights are not interpretable. We can hovewer find units that encode the nationality (English vs. Italian) and units that encode the generation a person belongs to (although noisy).
The model trained on the family trees task faces similiar problem that modern neural networks have when we try to intepret them. Neurons responding to multiple different concepts, distributed representations and a lack of clear concepts in some weights/activations makes interpretability very difficult.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
datasets.py		datasets.py
interp.py		interp.py
model.py		model.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning by back-propagating

Structure

Training a simple neural network

Interpretability

Insights

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Learning by back-propagating

Structure

Training a simple neural network

Interpretability

Insights

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages