layout: paper title: “White-Box Transformers via Sparse Rate Reduction” date: 1 Jun 2023 categories: research paper_url: https://arxiv.org/pdf/2306.01129 code_url: summary: “This paper argues that representation learning aims to compress data into low-dimensional Gaussian distributions, evaluated by a unified objective called sparse rate reduction. It interprets transformers as iterative optimizers of this objective. Specifically, it shows how transformer blocks, through alternating optimization, compress and sparsify data representations. This approach yields mathematically interpretable transformer-like networks that effectively compress and sparsify large-scale data, achieving competitive performance on datasets like ImageNet.”