Pretraining and IR

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
RoBERTa: A Robustly Optimized BERT Pretraining Approach
SpanBERT: Improving Pre-training by Representing and Predicting Spans
Improving Language Understanding by Generative Pre-Training
- GPT
Language Models are Unsupervised Multitask Learners
- GPT-2
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- T5 is going to be our new backbone
COCO-LM: Correcting and contrasting text sequences for language model pretraining
- contrastive learning in sequence
Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder
- auto-encoder for better doc representation
- this one has experiments on MARCO, NQ, and MIND settings, all are standard/official settings to use
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
- another auto-encoder
Condenser: a Pre-training Architecture for Dense Retrieval
- auto-encoder ish
REALM: Retrieval-Augmented Language Model Pre-Training
- DR for pretraining (in comparison to pretraining for DR)

Dense Retreival

dense passage retrieval for open-domain question answering
Approximate nearest neighbor negative contrastive learning for dense text retrieval
RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking

Pre-finetuning

Unsupervised corpus aware language model pre-training for dense passage retrieval
- condenser + contrastive
Large Dual Encoders Are Generalizable Retrievers
- T5 XL XXL and a good combination of DR techniques
Muppet: Massive Multi-task Representations with Pre-Finetuning
- a good view of pre-finetuning
Text and Code Embeddings by Contrastive Pre-Training
- OpenAI's sequence constrative learning
Pre-training Tasks for Embedding-based Large-scale Retrieval
- some study of ICT, very hard to make it work though
Taming pretrained transformers for extreme multi-label text classification
- see the connection between eXtreme classification and dense retrieval
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks
- continuous pretraining in in-domain corpus

Long Document

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Sankalan Pal Chowdhury, Adamos Solomou, Avinava Dubey, Mrinmaya Sachan. 2021. On Learning the Transformer Kernel
Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap. 2019. Compressive Transformers for Long-Range Sequence Modelling
Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier. 2020. Efficient Content-Based Sparse Attention with Routing Transformers
Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever. 2019. Generating Long Sequences with Sparse Transformers
Iz Beltagy, Matthew E. Peters, Arman Cohan. 2020. Longformer: The Long-Document Transformer
Joshua Ainslie, Santiago Ontanon, Chris Alberti, Vaclav Cvicek, Zachary Fisher, Philip Pham, Anirudh Ravula, Sumit Sanghai, Qifan Wang, Li Yang. 2020. ETC: Encoding Long and Structured Inputs in Transformers
Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret. 2020. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler. 2020. Long Range Arena: A Benchmark for Efficient Transformers

Prompt

https://github.com/thunlp/PromptPapers

Multi-Modal

Pre-training

Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
UNITER: UNiversal Image-TExt Representation Learning
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Learning Transferable Visual Models From Natural Language Supervision
VL-BEIT: Generative Vision-Language Pretraining
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

Text-Image Retrieval

Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Microsoft COCO Captions: Data Collection and Evaluation Server
ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

Text-Video Retrieval

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
Collecting Highly Parallel Data for Paraphrase Evaluation
Movie Description
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
Bridging Video-text Retrieval with Multiple Choice Questions

Multi-Modal Retrieval

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
WebQA: Multihop and Multimodal QA
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pretraining and IR

Dense Retreival

Pre-finetuning

Long Document

Prompt

Multi-Modal

Pre-training

Text-Image Retrieval

Text-Video Retrieval

Multi-Modal Retrieval

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Pretraining and IR

Dense Retreival

Pre-finetuning

Long Document

Prompt

Multi-Modal

Pre-training

Text-Image Retrieval

Text-Video Retrieval

Multi-Modal Retrieval

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages