Retrieval Augmented Causal Generation

DeepMind demonstrated in their recent [RETRO](https://arxiv.org/abs/2112.04426) paper that augmenting a language model's input with text retrieved from a corpus allows it to learn to copy relevant passages instead of storing those in its weights. This text retrieval is another solution to the problem mentioned in #8 and doesn't involve modifying the model. Instead, RETRO first retrieves similar text using BERT embeddings and then feeds that text into the cross-attention of their model together with the original prompt. This way, the decoder of their T5-model is aware of similar texts without storing them in its weights.\
We could implement a similar architecture without cross attention (#44) by using only autoregressive language modelling and retrieving chunks using BERT (or our own) embeddings. It would even be possible to test this approach without retraining a model by simply retrieving relevant chunks and feeding them into the context of our model (instead of using padding tokens).\
This issue tracks the progress of the initial proof-of-concept, its benchmarks against the baseline and its overall progress.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval Augmented Causal Generation #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Retrieval Augmented Causal Generation #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions