On Separate Normalization in Self-supervised Transformers January 01, 1000 https://arxiv.org/pdf/2309.12931 Fullscreen Dark Mode