Hiroki Naganuma

Feb 8th, 2024 まとめ

Original Paper

Sharpness-Aware Minimization for Efficiently Improving Generalization

Analysis

Why Does Sharpness-Aware Minimization Generalize Better Than SGD?

Screenshot 2023-10-14 at 18 12 26 Screenshot 2023-10-14 at 18 11 39

Normalization Layers Are All That Sharpness-Aware Minimization Needs

How Does Sharpness-Aware Minimization Minimize Sharpness?

Sharpness-Aware Minimization Leads to Low-Rank Features

Rethinking Sharpness-Aware Minimization as Variational Inference

SAM as an Optimal Relaxation of Bayes

When Do Flat Minima Optimizers Work?

Curvature

On the Maximum Hessian Eigenvalue and Generalization

The Hessian perspective into the Nature of Convolutional Neural Networks

Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data

Latent Space Oddity: on the Curvature of Deep Generative Models

Noise Stability Optimization for Flat Minima with Optimal Convergence Rates

Enhancement of SAM

Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

TRAM: Bridging Trust Regions and Sharpness Aware Minimization

Efficiency

Random Sharpness-Aware Minimization / NeurIPS2022

Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning

Towards Efficient and Scalable Sharpness-Aware Minimization

Auto-Tuning

Provable Sharpness-Aware Minimization with Adaptive Learning Rate

AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks

ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks

Domain Specific

When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations

Sharpness-Aware Minimization in Large-Batch Training: Training Vision Transformer In Minutes

Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models

Improving Shape Awareness and Interpretability in Deep Networks Using Geometric Moments

UnClassified

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning under Distribution Shift

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models