Feb 8th, 2024 まとめ
- Sharpness-aware minimization for efficiently improving generalization
- Sharpness-aware minimization leads to low-rank features
- [Wen22] How does sharpness-aware minimization minimize sharpness?
- [Wen23] Sharpness minimization algorithms do not only minimize sharpness to achieve better generalization
- Normalization layers are all that sharpness-aware minimization needs
- Same pre-training loss, better downstream: Implicit bias matters for language models
Original Paper
- Authors:
- Pierre Foret, Google Research
- Ariel Kleiner, Google Research
- Hossein Mobahi, Google Research
- Behnam Neyshabur, Blueshift, Alphabet1.
- Accepted at ICLR2021
- Findings: The authors demonstrate through rigorous empirical study that SAM improves model generalization across a range of widely studied computer vision tasks and models. The use of SAM also provides robustness to label noise. The authors further elucidate the connection between loss sharpness and generalization through the lens provided by SAM.
Analysis

Curvature
Enhancement of SAM
Efficiency
Auto-Tuning
Domain Specific
- Authors:
- Rajhans Singh, Arizona State University
- Ankita Shukla, Arizona State University
- Pavan Turaga, Arizona State University.
- Accepted at CVPR 2023 Deep learning for Geometric Computing workshop
- Findings: The authors introduce the Deep Geometric Moment (DGM) architecture, a deep-learning model that relies on geometric moments to measure shape-related properties. The DGM model generates discriminative features for classification tasks, outperforms existing ResNet models on standard datasets, and provides interpretable features at any level. The DGM model also only requires fine-tuning of the coordinate basis pipeline, instead of retraining all the model parameters, In particular, the proposed Deep Geometric Moment (DGM) architecture provides four key benefits compared to existing models. First, the model generates discriminative features for classification task by accounting for shape information through the proposed deep geometric moments. Second, our model outperforms existing ResNet models on standard datasets without using any pooling layer or reducing the spatial dimension. Third, it provides an easy access to interpretable features at any level by simple re-projection of moments. Finally, compared to existing models, the DGM model only requires finetuning of the coordinate basis pipeline without retraining all the model parameters.
UnClassified