Awesome Transformer in Vision

A curated list of vision transformer related resources. Please feel free to pull requests or open an issue to add papers.

Awesome Surveys

Title	Venue	BibTeX
A Survey on Visual Transformer	ArXiv	Bib
Intriguing Properties of Vision Transformers	ArXiv	Code
CVPR 2021 视觉Transformer论文（43篇）	github	--

Transformer in Vision

Task	Reg	Det	Seg	Trk	Other
Explanation	Image Recoginition	Object Detection	Image Segmentation	Object Tracking	other types

You can add a tag for domains which contains several transformer-based works

2021

(Pls follow Time Inverse Ranking)

Title	Venue	Task	Code	BibTeX
Generative Video Transformer: Can Objects be the Words?	ICML2021	Cls	--	--
Tracking Instances as Queries	arxiv	Seg	--	--
Instances as Queries	arxiv	Seg	--	GitHub
OadTR: Online Action Detection with Transformers	CVPRW	Det	--	GitHub
An Empirical Study of Training Self-Supervised Vision Transformers	ArXiv	Other	--	--
End-to-end Temporal Action Detection with Transformer	ArXiv	Cls	--	GitHub
MlTr: Multi-label Classification with Transformer	ArXiv	Cls	--	GitHub
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts	ArXiv	Other	--	--
Improved Transformer for High-Resolution GANs	ArXiv	Other	--	--
BEIT: BERT Pre-Training of Image Transformers	ArXiv	Cls	--	GitHub
XCiT: Cross-Covariance Image Transformers	ArXiv	Other	--	--
Semi-Autoregressive Transformer for Image Captioning	ArXiv	Other	--	--
Long-Short Temporal Contrastive Learning of Video Transformers	ArXiv	Other	--	--
Uformer: A General U-Shaped Transformer for Image Restoration	ArXiv	Other	--	GitHub
Video Super-Resolution Transformer	ArXiv	Other	--	GitHub
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification	ArXiv	Cls	--	GitHub
Semantic Correspondence with Transformers	ArXiv	Other	--	GitHub
Glance-and-Gaze Vision Transformer	ArXiv	Other	--	GitHub
Few-Shot Segmentation via Cycle-Consistent Transformer	ArXiv	Seg	--	--
Self-Supervised Learning with Swin Transformers	ArXiv	Other	--	GitHub
Visual Grounding with Transformers	ArXiv	Other	--	--
Associating Objects with Transformers for Video Object Segmentation	ArXiv	Seg	--	GitHub
When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations	ArXiv	Other	--	--
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification	ArXiv	Other	--	GitHub
Anticipative Video Transformer	ArXiv	Other	--	GitHub
An Attention Free Transformer	ArXiv	Other	--	--
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks	ArXiv	Other	GitHub	--
TransVOS: Video Object Segmentation with Transformers	ArXiv	Seg	--	--
You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection	ArXiv	Det	GitHub	--
ResT: An Efficient Transformer for Visual Recognition	ArXiv	Reg	GitHub	--
Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length	ArXiv	Other	--	--
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers	ArXiv	Seg	--	--
Aggregating Nested Transformers	ArXiv	Other	--	--
End-to-End Video Object Detection with Spatial-Temporal Transformers	ArXiv	Det	GitHub	--
HOTR: End-to-End Human-Object Interaction Detection with Transformers	CVPR2021	Other	GitHub	--
Line Segment Detection Using Transformers without Edges	CVPR2021	Other	--	--
Boosting Crowd Counting with Transformers	ArXiv	Other	--	--
Vision Transformers for Dense Prediction	ArXiv	Other	--	--
Points as Queries: Weakly Semi-supervised Object Detection by Points	ArXiv	Other	--	--
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet	Arxiv	Reg	GitHub	Bib @article{yuan2021tokens, title={Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet}, author={Yuan, Li and Chen, Yunpeng and Wang, Tao and Yu, Weihao and Shi, Yujun and Tay, Francis EH and Feng, Jiashi and Yan, Shuicheng}, journal={arXiv preprint arXiv:2101.11986}, year={2021} }
Bottleneck Transformers for Visual Recognition	Arxiv	Reg	GitHub	Bib @article{srinivas2021bottleneck, title={Bottleneck Transformers for Visual Recognition}, author={Srinivas, Aravind and Lin, Tsung-Yi and Parmar, Niki and Shlens, Jonathon and Abbeel, Pieter and Vaswani, Ashish}, journal={arXiv preprint arXiv:2101.11605}, year={2021} }
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation	Arxiv	Seg	---	Bib @article{duke2021sstvos, title={SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation}, author={Duke, Brendan and Ahmed, Abdalla and Wolf, Christian and Aarabi, Parham and Taylor, Graham W}, journal={arXiv preprint arXiv:2101.08833}, year={2021} }
TrackFormer: Multi-Object Tracking with Transformers	Arxiv	Trk	---	Bib @article{meinhardt2021trackformer, title={TrackFormer: Multi-Object Tracking with Transformers}, author={Meinhardt, Tim and Kirillov, Alexander and Leal-Taixe, Laura and Feichtenhofer, Christoph}, journal={arXiv preprint arXiv:2101.02702}, year={2021} }

2020

Title	Venue	Task	Code	BibTeX
End-to-End Video Instance Segmentation with Transformers	ArXiv	Seg	--	--
Training data-efficient image transformers & distillation through attention	ArXiv	Reg	GitHub	Bib @article{touvron2020training, title={Training data-efficient image transformers & distillation through attention}, author={Touvron, Hugo and Cord, Matthieu and Douze, Matthijs and Massa, Francisco and Sablayrolles, Alexandre and J{'e}gou, Herv{'e}}, journal={arXiv preprint arXiv:2012.12877}, year={2020} }
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale	ICLR	Reg	GitHub	Bib @article{dosovitskiy2020image, title={An image is worth 16x16 words: Transformers for image recognition at scale}, author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and others}, journal={arXiv preprint arXiv:2010.11929}, year={2020} }
Toward Transformer-Based Object Detection	ArXiv	Det	---	Bib @article{beal2020toward, title={Toward Transformer-Based Object Detection}, author={Beal, Josh and Kim, Eric and Tzeng, Eric and Park, Dong Huk and Zhai, Andrew and Kislyuk, Dmitry}, journal={arXiv preprint arXiv:2012.09958}, year={2020} }
Rethinking Transformer-based Set Prediction for Object Detection	ArXiv	Det	---	Bib @article{sun2020rethinking, title={Rethinking Transformer-based Set Prediction for Object Detection}, author={Sun, Zhiqing and Cao, Shengcao and Yang, Yiming and Kitani, Kris}, journal={arXiv preprint arXiv:2011.10881}, year={2020} }
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers	ArXiv	Det	---	Bib @article{dai2020up, title={UP-DETR: Unsupervised Pre-training for Object Detection with Transformers}, author={Dai, Zhigang and Cai, Bolun and Lin, Yugeng and Chen, Junying}, journal={arXiv preprint arXiv:2011.09094}, year={2020} }
Deformable DETR: Deformable Transformers for End-to-End Object Detection	ArXiv	Det	GitHub	Bib @article{zhu2020deformable, title={Deformable DETR: Deformable Transformers for End-to-End Object Detection}, author={Zhu, Xizhou and Su, Weijie and Lu, Lewei and Li, Bin and Wang, Xiaogang and Dai, Jifeng}, journal={arXiv preprint arXiv:2010.04159}, year={2020} }
End-to-End Object Detection with Transformers	ECCV	Det	GitHub	Bib article{zhu2020deformable, title={Deformable DETR: Deformable Transformers for End-to-End Object Detection}, author={Zhu, Xizhou and Su, Weijie and Lu, Lewei and Li, Bin and Wang, Xiaogang and Dai, Jifeng}, journal={arXiv preprint arXiv:2010.04159}, year={2020} }
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers	Arxiv	Seg	Github	Bib @article{zheng2020rethinking, title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers}, author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip HS and others}, journal={arXiv preprint arXiv:2012.15840}, year={2020} }
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers	Arxiv	Seg	---	Bib @article{wang2020max, title={MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers}, author={Wang, Huiyu and Zhu, Yukun and Adam, Hartwig and Yuille, Alan and Chen, Liang-Chieh}, journal={arXiv preprint arXiv:2012.00759}, year={2020} }
TransTrack: Multiple-Object Tracking with Transformer	ArXiv	Trk	GitHub	Bib @article{sun2020transtrack, title={TransTrack: Multiple-Object Tracking with Transformer}, author={Sun, Peize and Jiang, Yi and Zhang, Rufeng and Xie, Enze and Cao, Jinkun and Hu, Xinting and Kong, Tao and Yuan, Zehuan and Wang, Changhu and Luo, Ping}, journal={arXiv preprint arXiv:2012.15460}, year={2020} }

2012-2019

Title	Venue	Task	Code	BibTeX
Attention Is All You Need	NeurIPS'17	--	GitHub	Bib @inproceedings{vaswani2017attention, title={Attention is all you need}, author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia}, booktitle={Advances in neural information processing systems}, pages={5998--6008}, year={2017} }

Awesome vTransformer Libraies

WaitingToAdd

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Transformer in Vision

Table of Contents

Awesome Surveys

Transformer in Vision

2021

2020

2012-2019

Awesome vTransformer Libraies

About

Uh oh!

Releases

Packages

XinLi-zn/VisionTransformer

Folders and files

Latest commit

History

Repository files navigation

Awesome Transformer in Vision

Table of Contents

Awesome Surveys

Transformer in Vision

2021

2020

2012-2019

Awesome vTransformer Libraies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages