DeepNet: Scaling Transformers to 1,000 Layers March 01, 2022 https://arxiv.org/pdf/2203.00555 Fullscreen Dark Mode