Abstract
In dermatology, deep learning may be applied for skin lesion classification. However, for a given input image, a neural network only outputs a label, obtained using the class probabilities, which do not model uncertainty. Our group developed a novel method to quantify uncertainty in stochastic neural networks. In this study, we aimed to train such network for skin lesion classification and evaluate its diagnostic performance and uncertainty, and compare the results to the assessments by a group of dermatologists. By passing duplicates of an image through such a stochastic neural network, we obtained distributions per class, rather than a single probability value. We interpreted the overlap between these distributions as the output uncertainty, where a high overlap indicated a high uncertainty, and vice versa. We had 29 dermatologists diagnose a series of skin lesions and rate their confidence. We compared these results to those of the network. The network achieved a sensitivity and specificity of 50% and 88%, comparable to the average dermatologist (respectively 68% and 73%). Higher confidence/less uncertainty was associated with better diagnostic performance both in the neural network and in dermatologists. We found no correlation between the uncertainty of the neural network and the confidence of dermatologists (R = −0.06, p = 0.77). Dermatologists should not blindly trust the output of a neural network, especially when its uncertainty is high. The addition of an uncertainty score may stimulate the human-computer interaction.