Training data-efficient image transformers & distillation through attention January 15, 2021 https://arxiv.org/pdf/2012.12877 Fullscreen Dark Mode