REGRESSION MODELS FOR EARLY ESTIMATING THE NUMBER OF LINES OF CODE OF VARIOUS RELEASES OF OPEN-SOURCE APPLICATIONS

Authors

DOI:

https://doi.org/10.32782/mathematical-modelling/2025-8-2-9

Keywords:

regression model, estimation, lines of code, open-source application, class, normalizing transforma- tion, Mardia criterion, outlier, Mahalanobis distance, test statistic

Abstract

The problem of early estimating the number of lines of code of various releases of open-source applications is important because it directly affects the forecasting of efforts for their development and subsequent modification. The object of the study is the process of early estimating the number of lines of code of various releases of open-source applications. The subject of the study is the regression models for early estimating the number of lines of code of various releases of open-source applications. The goal of the work is to build some regression models with one factor for early estimating the number of lines of code of various releases of open-source applications. In the work, we built two nonlinear regression models for early estimating the number of lines of code of various releases of open-source applications depending on the number of classes. The first model allows for estimating the num- ber of lines of code of the first release of open-source applications, and the second model allows for estimating the number of lines of code of the last release of open-source applications. These models were built on two datasets of two metrics from 40 open-source applications: the number of lines of code and the number of classes for the first and last releases of the respective applications. According to the Mardia criterion, the distribution of these two-dimensional data devi- ated from the Gaussian. Therefore, to check for outliers in these data sets, an appropriate method based on the squared Mahalanobis distance for normalized data was applied. A normalizing transformation in the form of a decimal logarithm was used to normalize the data. According to the Mardia criterion, the distribution of the normalized data did not deviate from normal. The parameter estimates of the obtained models were found by the method of least squares. The quality of the constructed models was checked by three known indicators: the coefficient of determination R2, the mean magnitude of relative error MMRE, and the percentage of prediction PRED, for which the relative error values are less than 0.25, PRED(0.25). The obtained values of the specified quality indicators indicate a satisfactory quality of the two constructed nonlinear regression models.

References

Himansh M., Manikandan V.M. A statistical study and analysis to identify the importance of open-source software. Information Technology (ICITIIT) : proceedings of 2022 International Conference, Kottayam, 13–14 February 2022. Kottayam, India: IEEE, 2022. P. 1–6. https://doi.org/10.1109/ICITIIT54346.2022.9744176

Haider S., Khalil W., Al-Shamayleh A.S., Akhunzada A., Gani A. Risk factors and practices for the development of open source software from developers’ perspective. IEEE Access. 2023. Vol. 11. P. 63333–63350. https://doi.org/10.1109/ACCESS.2023.3267048

Molnar A.J., Neamţu A., Motogna S. Evaluation of software product quality metrics. Communications in Computer and Information Science. 2020. Vol. 1172. P. 163–187. https://doi.org/10.1007/978-3-030-40223-5_8

Gradišnik M., Beranič T., Karakatič S. Impact of historical software metric changes in predicting future maintainability trends in open-source software development. Applied Sciences. 2020. № 10 (13). 4624. https://doi.org/10.3390/app10134624

Daud M., Malik A.A. Improving the accuracy of early software size estimation using analysis-to-design adjustment factors (ADAFs). IEEE Access. 2021. Vol. 9. P. 81986–81999. https://doi.org/10.1109/ACCESS.2021.3085752

Dewi R.S., Araynawa T.K., Prasanna F.M., Felianasari N., Rahmawati R., Hartantc A.E., … Mazaya Al-K. Improving software size estimation using data complexity (Case study: Research and community service monitoring apps). Electrical engineering, computer science and informatics (EECSI) : proceedings of 2024 11th International conference, Yogyakarta, 26–27 September 2024.Yogyakarta, Indonesia: IEEE, 2024. P. 315–319. https://doi.org/10.1109/ EECSI63442.2024.10776530

Dewi R.S., Zahrah F.A., Nugraha D.A., Prabowo P.S., Safitri A., Jayadi P. Predicting software size based on conceptual data model (Case study: Shrimp pond system management). Electrical engineering and computer science (ICECOS) : proceedings of 2024 International conference, Palembang, 25–26 September 2024. Palembang, Indonesia: IEEE, 2024. P. 175–178. https://doi.org/10.1109/ICECOS63900.2024.10791154

Nassif A.B., AbuTalib M., Capretz L.F. Software effort estimation from Use Case diagrams using nonlinear regression analysis. Electrical and computer engineering : proceedings of IEEE Canadian conference, 30 August – 02 September 2020. London, ON, Canada: IEEE, 2020. P. 1–4. https://doi.org/10.1109/CCECE47787.2020.9255712

Hussain I., Malik A.A. Determining the utility of use case points and class points in early software size estimation. Emerging Technologies (ICET) : proceedings of 2023 18th International Conference, Peshawar, 06–07 November 2023. Peshawar, Pakistan: IEEE, 2023. P. 171–175. https://doi.org/10.1109/ICET59753.2023.10374977

Prykhodko S.B., Shutko I.S., Prykhodko A.S. Early size estimation of web apps created using Codeigniter framework by nonlinear regression models. Radio-electronic and computer systems. 2022. Vol. 103, No 3. P. 84–94. https://doi.org/10.32620/reks.2022.3.06

Manisha, Rishi R. Early size estimation using machine learning. Computing for sustainable global development (INDIACom) : proceedings of the 2021 8th International conference, New Delhi, 17-19 March 2021. New Delhi, India, Los Alamitos: IEEE, 2021. P. 757–762. https://doi.org/10.1109/INDIACom51348.2021.00135

Molla Y.S., Alemneh E., Yimer S.T. COSMIC-based early software size estimation using deep learning and domain-specific BERT. IEEE Access. 2025. Vol. 13. P. 28463–28475. https://doi.org/10.1109/ACCESS.2025.3540548

Nhung H.L.T.K., Hai V.V., Silhavy R., Prokopova Z., Silhavy P. Parametric software effort estimation based on optimizing correction factors and multiple linear regression. IEEE Access. 2022. Vol. 10. P. 2963–2986. https://doi.org/10.1109/ACCESS.2021.3139183

Приходько С., Шутко І. Регресійні моделі для раннього оцінювання кількості рядків коду вебзастосунків, що створюються за допомогою фреймворку Codeigniter. Прикладні питання математичного моделювання. 2025. Т. 8, № 1. С. 189–196. https://doi.org/10.32782/mathematical-modelling/2025-8-1-18

Published

2025-12-30