Repository files navigation
BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding
RoBERTa: A Robustly Optimized BERT Pretraining Approach
SpanBERT: Improving Pre-training by Representing
and Predicting Spans
Improving Language Understanding
by Generative Pre-Training
Language Models are Unsupervised Multitask Learners
BART: Denoising Sequence-to-Sequence Pre-training for Natural
Language Generation, Translation, and Comprehension
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
T5 is going to be our new backbone
COCO-LM: Correcting and contrasting text sequences for language model pretraining
contrastive learning in sequence
Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder
auto-encoder for better doc representation
this one has experiments on MARCO, NQ, and MIND settings, all are standard/official settings to use
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
Condenser: a Pre-training Architecture for Dense Retrieval
REALM: Retrieval-Augmented Language Model Pre-Training
DR for pretraining (in comparison to pretraining for DR)
dense passage retrieval for open-domain question answering
Approximate nearest neighbor negative contrastive learning for dense text retrieval
RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking
Unsupervised corpus aware language model pre-training for dense passage retrieval
Large Dual Encoders Are Generalizable Retrievers
T5 XL XXL and a good combination of DR techniques
Muppet: Massive Multi-task Representations with Pre-Finetuning
a good view of pre-finetuning
Text and Code Embeddings by Contrastive Pre-Training
OpenAI's sequence constrative learning
Pre-training Tasks for Embedding-based Large-scale Retrieval
some study of ICT, very hard to make it work though
Taming pretrained transformers for extreme multi-label text classification
see the connection between eXtreme classification and dense retrieval
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks
continuous pretraining in in-domain corpus
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Sankalan Pal Chowdhury, Adamos Solomou, Avinava Dubey, Mrinmaya Sachan. 2021. On Learning the Transformer Kernel
Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap. 2019. Compressive Transformers for Long-Range Sequence Modelling
Aurko Roy, Mohammad Saffar, Ashish Vaswani, David Grangier. 2020. Efficient Content-Based Sparse Attention with Routing Transformers
Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever. 2019. Generating Long Sequences with Sparse Transformers
Iz Beltagy, Matthew E. Peters, Arman Cohan. 2020. Longformer: The Long-Document Transformer
Joshua Ainslie, Santiago Ontanon, Chris Alberti, Vaclav Cvicek, Zachary Fisher, Philip Pham, Anirudh Ravula, Sumit Sanghai, Qifan Wang, Li Yang. 2020. ETC: Encoding Long and Structured Inputs in Transformers
Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed. 2020. Big Bird: Transformers for Longer Sequences
Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret. 2020. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler. 2020. Long Range Arena: A Benchmark for Efficient Transformers
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
UNITER: UNiversal Image-TExt Representation Learning
ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning
Learning Transferable Visual Models From Natural Language Supervision
VL-BEIT: Generative Vision-Language Pretraining
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
Microsoft COCO Captions: Data Collection and Evaluation Server
ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
Collecting Highly Parallel Data for Paraphrase Evaluation
Movie Description
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
Bridging Video-text Retrieval with Multiple Choice Questions
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
WebQA: Multihop and Multimodal QA
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text
About
Must-read papers
Resources
Stars
Watchers
Forks
You can’t perform that action at this time.