MATHEMATICAL MODELS FOR THE SIZE ESTIMATING OF JAVA APPLICATIONS
DOI:
https://doi.org/10.35546/kntu2078-4481.2024.2.28Keywords:
software size, lines of code, Java-application, nonlinear regression model, normalizing transformation, non-Gaussian data.Abstract
This paper introduces the usage of mathematical models for Java applications size estimation. The Java programming language is one of the most widely used in the world and is used in the development of various software projects. Size estimation of Java-applications is one of the key planning tasks at the early stages of software project planning. The aim of the study is to increase the accuracy of Java application code lines estimation at the early stages of software project development using class diagram metrics by building nonlinear regression models. The object of study is the Java applications size estimation process. The subject of the study is mathematical models for Java applications size estimation. To achieve this goal, we collected 2 samples of code metrics from open source Java applications – a training sample of 286 data points and a test sample of 285 data points. We analyzed and compared existing mathematical models and equations of Java application size estimation using the test sample. Proven that the existing regression equations and models have an unsatisfactory level of accuracy for Java applications size estimation or cannot be applied to the given data set due to the limitations of regression models. For Java applications size estimation, using training sample we built one-factor nonlinear regression models based on the normalizing transformations of the decimal logarithm, Box-Cox and Johnson of the SB family by the number of classes (CLASS) metric and a two-factor nonlinear regression model based on the normalizing transformation of the decimal logarithm by the number of classes (CLASS) and the visible methods quantity (VMQ) metrics. The obtained two-factor nonlinear regression model based on the decimal logarithm normalizing transformation has a smaller mean magnitude of relative error, a higher value of the percentage of prediction of the relative error level and a higher value of the determination coefficient, which, in comparison with existing models, allows to increase the reliability and accuracy of source lines of code estimation of Java applications.
References
TIOBE Index. URL: https://www.tiobe.com/tiobe-index/ (дата звернення 08.04.2024).
Munialo S.W. A Review of Agile Software Effort Estimation Methods. International Journal of Computer Applications Technology and Research. Association of Technology and Science. 2016. Vol. 5. pp. 612–618. DOI:10.7753/IJCATR0509.1009.
Tan H.B.K., Zhao Y., Zhang H. Estimating LOC for information systems from their conceptual data models. Proceedings – International Conference on Software Engineering. 2006. pp. 321-330. DOI:10.1145/1134285.1134331.
Tan H.B.K., Zhao Y., Zhang H. Conceptual Data Model-Based Software Size Estimation for Information Systems, ACM Transactions of Software Engineering and Methodology. 2009. Vol. 19. DOI:10.1145/1571629.1571630.
Приходько Н.В., Приходько С.Б. Нелінійна регресійна модель для оцінювання розміру програмного забезпечення промислових інформаційних систем на Java. Моделювання та інформаційні технології. 2018. Вип. 85. С. 81–88. URL: http://nbuv.gov.ua/UJRN/Mtit_2018_85_14
Макарова Л.М., Приходько Н.В., Кудін О.О. Побудова нелінійної регресійної моделі для оцінювання розміру веб-додатків, реалізованих мовою Java. Вісник Херсонського національного технічного університету. 2019.№2 (69). С. 145–153. URL: http://eir.nuos.edu.ua/handle/123456789/4443
Приходько С.Б., Приходько Н.В., Смикодуб Т.Г. Чотирьохфакторна нелінійна регресійна модель для оцінювання розміру JAVA-застосунків з відкритим кодом. Вчені записки ТНУ імені В.І. Вернадського. Серія: технічні науки Том 31 (70) №2 Частина 1. 2020. С. 157–162. DOI:10.32838/2663-5941/2020.2-1/25
Port D., Korte M. Comparative studies of the model evaluation criterions MMRE and PRED in software cost estimation research. Proceedings of the 2nd ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, New York, 2008. pp. 51–60. DOI:10.1145/1414004.1414015
Jia J., Qiu W. Research on an Ensemble Classification Algorithm Based on Differential Privacy. IEEE Access. 2020. P. 99. DOI:10.1109/ACCESS.2020.2995058
Mardia K. V., Measures of multivariate skewness and kurtosis with applications, Biometrika. 1970. Vol. 57. pp. 519–530. DOI:10.1093/biomet/57.3.519
Prykhodko S., Prykhodko N., Mathematical Modeling of Non-Gaussian Dependent Random Variables by Nonlinear Regression Models Based on the Multivariate Normalizing Transformations, Mathematical Modeling and Simulation of Systems (MODS'2020). Advances in Intelligent Systems and Computing. 2021. Vol. 1265. PP. 166–174. DOI:10.1007/978-3-030-58124-4_16
Prykhodko S., Prykhodko N., Makarova L., Pukhalevych A. Outlier Detection in Non-Linear Regression Analysis Based on the Normalizing Transformations, 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET). Lviv-Slavske, Ukraine, 2020. pp. 407–410, DOI:10.1109/TCSET49122.2020.235464.
Olkin I., Sampson A.R. Multivariate Analysis: Overview. International encyclopedia of social & behavioral sciences (eds.) 1st edn., Elsevier, Pergamon, 2001. pp. 10240–10247.