(2/) Joint work w/ Weitang Liu, Xiaoyun Wang, and John Owens. We show that energy is desirable for OOD detection since it is provably aligned with the probability density of the input—samples with higher energies can be interpreted as data with a lower likelihood of occurrence.
(3/) In contrast, we show mathematically that softmax confidence score is a biased scoring function that is not aligned with the density of the inputs and hence is not suitable for OOD detection.
(4/) Importantly, energy score can be derived from a purely discriminative classification model without relying on a density estimator explicitly, and therefore circumvents the difficult optimization process in training generative-based models such as JEM.
(5/) Within our framework, we demonstrate that energy can be flexibly used as a scoring function for any pre-trained neural classifier as well as a trainable cost function to shape the energy surface explicitly for OOD detection.
(6/) Results highlight: on WideResNet, the energy score reduces the average FPR95 by 18.03% on CIFAR-10 compared to using the softmax confidence score. With energy-based training, our method outperforms existing SoTA.
(7/) Previous approaches such as ODIN and Mahalanobis may have hyperparameters to be tuned. In contrast, the energy score is a parameter-free measure, which is easy to use and implement, and in many cases, achieves comparable or even better performance.
(8/) More broadly, our work builds on the insights and principles from energy-based models by @ylecun et al. We were also inspired by the early work on energy-based GAN by Jake Zhao, and JEM by Grathwohl et al.
(9/)
Happy to get feedback if you have more detailed comments or want to engage with us!
• • •
Missing some Tweet in this thread? You can try to
force a refresh