On Calibration of Modern Neural Networks
We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated.
Unified Uncertainty Calibration
In this paper, we investigate the tension between minimizing error disparity across different population groups while maintaining calibrated probability estimates.
Multicalibration gives a comprehensive methodology to address group fairness. Our work gives sample complexity bounds for uniform convergence guarantees of multicalibration error.
In this work, we draw a link between OOD performance and model calibration, arguing that calibration across multiple domains can be viewed as a special case of an invariant representation leading to better OOD generalization.
We propose a more principled fix that minimizes an explicit calibration error during training.
We present a large-scale benchmark of existing state-of-the-art methods on classification problems and investigate the effect of dataset shift on accuracy and calibration.
We propose a proactive approach which learns a relationship in the training domain that will generalize to the target domain by incorporating prior knowledge of aspects of the data generating process that are expected to differ as expressed in a causal selection diagram.
We provide a thorough analysis of the factors causing miscalibration, and use the insights we glean from this to justify the empirically excellent performance of focal loss.
In this work, we aim to learn general post-hoc calibration functions that can preserve the top-k predictions of any deep network.
We propose a natively multiclass calibration method applicable to classifiers from any model class, derived from Dirichlet distributions and generalising the beta calibration method from binary classification.
We propose two new measures for calibration, the Static Calibration Error (SCE) and Adaptive Calibration Error (ACE).
In this work, we present a scalable marginal-likelihood estimation method to select both hyperparameters and network architectures, based on the training data alone.