Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization July 04, 2023 https://arxiv.org/pdf/2303.03108 Fullscreen Dark Mode