MATHEMATICAL MODELS FOR DETECTING OUTLIERS IN TWO-DIMENSIONAL CBO AND RFC METRICS’ DATA OF OPEN-SOURCE KOTLIN-BASED APPLICATIONS
DOI:
https://doi.org/10.32782/mathematical-modelling/2025-8-1-8Keywords:
software metrics, CBO (Coupling Between Objects), RFC (Response for Class), outliers, Kotlin, Box-Cox transformation, Mahalanobis distance, prediction ellipse, object-oriented designAbstract
Modern software development requires robust methods for assessing the quality and complexity of object-oriented design. Such evaluation is typically performed using software metrics. However, this process usually requires metric data processing, including outlier detection. In real-world scenarios, metric distributions deviate from normality in the vast majority of cases. This reality necessitates the development of specialized mathematical models that account for such non-standard behavior of software metric data.The study proposes mathematical models for more reliable detection of outliers in two-dimensional CBO and RFC metric data specifically for Kotlin projects. Analysis of 102 open-source projects revealed substantial differences com- pared to Java: the average CBO value for Kotlin was only 2,20 (versus 11,88 for Java), and RFC – 7,46 (versus 18,36).Mardia’s test confirmed that the distribution of these metrics for Kotlin significantly deviates from normal (β₁ = 70,867, β₂ = 14,066 with critical values of 9,3 and 9,48).To solve this problem, we developed a model that includes: application of Box-Cox normalization transformation with parameters and implementation of two complementary approaches for outlier detection (based on χ²-distribution quantile and based on Fisher’s F-distribution quantile).The results demonstrate that our models overcome the key limitation of Java-oriented approaches, where up to 25% of Kotlin projects were misclassified as outliers. The refined models show greater reliability in analyzing Kotlin CBO and RFC metrics while preserving all advantages of the prediction ellipsoid approach. Notably, the χ²-criterion model proved more stringent, enabling more accurate outlier detection. This research provides practical value for Kotlin developers by offering specialized tools for CBO and RFC metric analysis. Our findings confirm the necessity of developing language-specific analytical models that account for unique metric distribution characteristics.
References
Chidamber S.R., Kemerer C.F. A metrics suite for objectoriented design. IEEE Transactions on Software Engineering. 1994. Vol. 20, № 6. P. 476–493. http://dx.doi.org/10.1109/32.295895.
Booch G. Object-Oriented Analysis and Design with Applications. Pearson Education, Limited, 2011. 608 p.
Molnar A.-J., Neamţu A., Motogna S. Evaluation of Software Product Quality Metrics. Communications in Computer and Information Science. Cham, 2020. P. 163–187. https://doi.org/10.1007/978-3-030-40223-5_8.
Johnson R.A., Wichern D.W. Applied Multivariate Statistical Analysis (6th Edition). Prentice Hall, 2007. 800 p.
Ghorbani H. Mahalanobis distance and its application for detecting multivariate outliers. Facta Universitatis. Series “Mathematics and Informatics”. 2019. P. 583. https://doi.org/10.22190/FUMI1903583G.
IoTDS: A One-Class Classification Approach to Detect Botnets in Internet of Things Devices / V.H. Bezerra et al. Sensors. 2019. Vol. 19. № 14. P. 3188. https://doi.org/10.3390/s19143188.
Kim S.-G., Park D., Jung J.-Y. Evaluation of One-Class Classifiers for Fault Detection: Mahalanobis Classifiers and the Mahalanobis – Taguchi System. Processes. 2021. Vol. 9. № 8. P. 1450. https://doi.org/10.3390/pr9081450.
Prykhodko S., Prykhodko N., Makarova L., Pugachenko K. Detecting outliers in multivariate non- Gaussian data on the basis of normalizing transformations. 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), Kyiv, Ukraine, 2017, pp. 846–849, DOI: 10.1109/UKRCON.2017.8100366.
Prykhodko S., Prykhodko N., Makarova L., Pukhalevych A. Application of the squared Mahalanobis distance for detecting outliers in multivariate non-Gaussian data. 2018 14th International Conference on Advanced Trends in Radioelecrtronics, Telecommunications and Computer Engineering (TCSET), Lviv ; Slavske, Ukraine, 2018, pp. 962–965, DOI: 10.1109/ TCSET.2018.8336353.
Prykhodko S., Prykhodko N., Makarova L., Kudin O., Smykodub T., Prykhodko A. Detecting bivariate outliers on the basis of normalizing transformations for non-Gaussian data. The Vth International Conference «Advanced Information Systems and Technologies, AIST 2017, May 16–17. 2017. Sumy. URL: http://essuir.sumdu.edu.ua/handle/123456789/55754.
Prykhodko S., Prykhodko N., Smykodub T. A Joint Statistical Estimation of the RFC and CBO Metrics for Open-Source Applications Developed in Java. 2022 IEEE 17th International Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, 10–12 November 2022. 2022. https://doi.org/10.1109/csit56902.2022.10000457.
Prykhodko S., Prykhodko N., Koltsov A. A nonlinear regression model for early LOC estimation of open-source Kotlin-based applications. Radio Electronics, Computer Science, Control. 2024. № 1. P. 85. https://doi.org/10.15588/1607-3274-2024-1-8.
Mardia K.V. Measures of multivariate skewness and kurtosis with applications. Biometrika. 1970. Vol. 57, № 3. P. 519–530. https://doi.org/10.1093/biomet/57.3.519.






