UNCERTAINTY CALIBRATION IN COMPUTER VISION: COMPARISON OF BNN, DEEP ENSEMBLES, MC DROPOUT AND EVIDENTIAL DL

Authors

DOI:

https://doi.org/10.35546/kntu2078-4481.2025.3.2.11

Keywords:

calibration, uncertainty, Bayesian neural networks, MC dropout, computer vision, ECE, NLL

Abstract

The article is devoted to a comparative analysis of modern methods of uncertainty calibration in computer vision tasks.The problem of miscalibration of deep neural networks becomes critical when used in systems with a high cost of error: medical diagnostics, autonomous driving, industrial control. Modern architectures often demonstrate overconfidence in incorrect predictions, which makes effective decision-making impossible. Four approaches are investigated: Bayesian Neural Networks (BNN) with variational inference; Deep Ensembles that aggregate predictions of independent models; Monte Carlo Dropout for approximating Bayesian inference; Evidential Deep Learning with modeling of Dirichlet distributions. Experiments were conducted on CIFAR-10/100 for classification, CIFAR-10-C for robustness assessment, COCO val2017 for object detection, Cityscapes val for segmentation. All methods were tested on identical architectures: ResNet-50, WideResNet-28-10, Faster R-CNN, DeepLabV3+. The results show that Deep Ensembles provide the best balance: accuracy improvement by 0.8–1.8 % at ECE < 2 %. Temperature scaling reduces Expected Calibration Error by 60–75 % without additional costs, making it mandatory for production systems. BNNs demonstrate the best OOD detection (AUROC 0.813), but are inferior in accuracy by 0.5–1.2 %. MC Dropout has a 20-fold increase in inference time with a moderate improvement in calibration. Evidential DL shows instability on OOD data. Practical recommendations are formulated: for critical applications – an ensemble of 5 models with temperature scaling; for real-time systems – one model with calibration; for edge devices – knowledge distillation from the ensemble. Optimal configuration: 3–5 models with temperature scaling provides 90 % improvement with 3–5x increase in inference time.

References

Guo C., Pleiss G., Sun Y., Weinberger K. Q. On calibration of modern neural networks. 34th International Conference on Machine Learning. 2017. P. 1321–1330.

Blundell C., Cornebise J., Kavukcuoglu K., Wierstra D. Weight uncertainty in neural networks. 32nd International Conference on Machine Learning. 2015. P. 1613–1622.

Louizos C., Welling M. Multiplicative normalizing flows for variational Bayesian neural networks. 34th International Conference on Machine Learning. 2017. P. 2218–2227.

Wenzel F., Roth K., Veeling B. S., Światkowski J., Tran L., Mandt S. et al. How good is the Bayes posterior in deep neural networks really? Advances in Neural Information Processing Systems. 2020. Vol. 33. P. 5833–5845.

Lakshminarayanan B., Pritzel A., Blundell C. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems. 2017. Vol. 30. P. 6402–6413.

Fort S., Jastrzebski S. Deep ensembles: A loss landscape perspective. arXiv preprint arXiv:1912.11370. 2019.

Ovadia Y., Fertig E., Ren J., Nado Z., Sculley D., Nowozin S. et al. Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. Advances in Neural Information Processing Systems. 2019. Vol. 32.

Gal Y., Ghahramani Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. 33rd International Conference on Machine Learning. 2016. P. 1050–1059.

Osband I. Risk versus uncertainty in deep learning: Bayes, bootstrap and the dangers of dropout. NIPS Workshop on Bayesian Deep Learning. 2016.

Mukhoti J., Gal Y. Evaluating uncertainty for object detection using Monte Carlo dropout. International Conference on Computer Vision: Workshop on Uncertainty and Robustness. 2018.

Sensoy M., Kaplan L., Kandemir M. Evidential deep learning to quantify classification uncertainty. The Journal of Machine Learning Research. 2018. Vol. 19. No. 1. P. 962–1009.

Amini A., Schwarting W., Soleimany A., Rus D. Deep evidential regression. Advances in Neural Information Processing Systems. 2020. Vol. 33. P. 14927–14937.

Kopetzki A., Chen Y., Slesarev V. Rethinking evidential deep learning. arXiv preprint arXiv:2110.03687. 2021.

Kumar A., Liang P. S., Ma T. Verified uncertainty calibration. Advances in Neural Information Processing Systems. 2019. Vol. 32. P. 3787–3798.

Nixon J., Dusenberry M., Zhang L., Jerfel G., Tran D. Measuring calibration in deep learning. CVPR Workshops. 2019. P. 38–41.

Minderer M., Joslin D., Gleave A., Abeles A., Bachem O. Revisiting model calibration: Do confounding factors affect calibration metrics? arXiv preprint arXiv:2106.07998. 2021.

Published

2025-11-28