MATHEMATICAL MODELING AND FEATURE IMPORTANCE ANALYSIS IN THE PROBLEM OF HOUSING PRICE FORECASTING IN THE SECONDARY MARKET USING MACHINE LEARNING
DOI:
https://doi.org/10.32782/mathematical-modelling/2026-9-1-15Keywords:
regression modeling, ensemble algorithms, feature importance analysis, feature engineering, secondary housing market, price predictionAbstract
The paper investigates the problem of price prediction in the secondary residential housing market of Kyiv through the construction and comparison of alternative regression models. The relevance of the study is determined by the multifactor nature of housing price formation, high market volatility, and the need to improve the accuracy of analytical valuation tools. Classical parametric approaches do not always adequately capture nonlinear relationships and interactions between property characteristics, which justifies the use of modern algorithmic methods. A comprehensive data preparation procedure was implemented, including data cleaning, missing value imputation using the k-nearest neighbors algorithm, Winsorization of extreme observations, logarithmic transformation of target variables, and normalization of numerical features. Particular attention was given to feature engineering based on textual property descriptions. Recurrent qualitative characteristics were identified and encoded as binary variables, enabling the incorporation of infrastructural and comfort-related attributes into the analytical framework. Additionally, geospatial enrichment was performed by calculating the distance of each property to the city center as a key location indicator. Multiple linear regression, ensemble algorithms (XGBoost, LightGBM, Random Forest), and an artificial neural network were developed and compared. Model performance was evaluated on a test dataset (20 % of observations) using R2, RMSE, MAE, and MAPE metrics. Boosting algorithms, particularly LightGBM, demonstrated the highest predictive accuracy. Feature importance analysis revealed that total area is the dominant determinant of overall housing price, together with location-related characteristics and building age. In contrast, price per square meter is more strongly driven by locational parameters and administrative affiliation. The results confirm the effectiveness of ensemble algorithms combined with extended feature engineering for quantitative analysis of price formation mechanisms in the local residential real estate market.
References
Національний банк України. Звіт про фінансову стабільність. Грудень 2025 року [Електронний ресурс]. Київ : Національний банк України, 2025. URL: https://bank.gov.ua/admin_uploads/article/FSR_2025-H2.pdf (дата звернення: 27.01.2026).
Деякі питання забезпечення функціонування Єдиної державної електронної системи у сфері будівництва : Постанова Кабінету Міністрів України від 23.06.2021 № 681 : станом на 30.12.2025 [Електронний ресурс] // База даних «Законодавство України». URL: https://zakon.rada.gov.ua/go/681-2021-%D0%BF (дата звернення: 27.01.2026).
Про оцінку майна, майнових прав та професійну оціночну діяльність в Україні : Закон України від 12.07.2001 № 2658-III [Електронний ресурс] // База даних «Законодавство України». URL: https://zakon.rada.gov.ua/go/2658-14 (дата звернення: 27.01.2026).
Пашкевич О., Ващищак С., Бойчук А., Стисло Т., Демчина М. Застосування моделей машинного навчання для прогнозування цін на ринку нерухомості. Вісник Хмельницького національного університету. Серія : Технічні науки. 2022. № 5 (313). С. 265–273. DOI: https://doi.org/10.31891/2307-5732-2022-313-5-265-273
Hoxha V., Shala A. Comparative analysis of machine learning models in predicting housing prices: a case study of Prishtina’s real estate market. International Journal of Housing Markets and Analysis. 2025. Vol. 18, No. 3. P. 694–711. DOI: https://doi.org/10.1108/IJHMA-09-2023-0120
Moreno-Foronda I., Sánchez-Martínez M.-T., Pareja-Eastaway M. Comparative Analysis of Advanced Models for Predicting Housing Prices: A Review. Urban Science. 2025. Vol. 9, No. 2. Art. 32. DOI: https://doi.org/10.3390/urbansci9020032
Alkan T., Dokuz Y., Ecemiş A., Bozdağ A., Durduran S. S. Using machine learning algorithms for predicting real estate values in tourism centers. Soft Computing. 2023. Vol. 27. P. 2601–2613. DOI: https://doi.org/10.1007/s00500-022-07579-7
Yazdani M. Machine Learning, Deep Learning, and Hedonic Methods for Real Estate Price Prediction [Електронний ресурс] // arXiv. 2021. DOI: https://doi.org/10.48550/arXiv.2110.07151
Jha S. B., Babiceanu R. F., Pandey V., Jha R. K. Housing Market Prediction Problem using Different Machine Learning Algorithms: A Case Study [Електронний ресурс] // arXiv. 2020. DOI: https://doi.org/10.48550/arXiv.2006.10092
Верес О., Шимоняк А. Прогнозування вартості нерухомості з використанням засобів машинного навчання. Information Systems and Networks. 2024. Вип. 15. С. 140–158. DOI: https://doi.org/10.23939/sisn2024.15.140
Zhang H., Li Y., Branco P. Describe the house and I will tell you the price: House price prediction with textual description data. Natural Language Engineering. 2024. Vol. 30, Iss. 4. P. 661–695. DOI: https://doi.org/10.1017/S1351324923000360
R Core Team. R: A language and environment for statistical computing [Електронний ресурс]. Vienna : R Foundation for Statistical Computing, 2024. URL: https://www.R-project.org/ (дата звернення: 27.01.2026).
Kyrychenko R. Kyiv secondary (resale) residential real estate [Електронний ресурс] : dataset. Zenodo, 2026. DOI: https://doi.org/10.5281/zenodo.18413277
James G., Witten D., Hastie T., Tibshirani R. An Introduction to Statistical Learning: With Applications in R. 2nd ed. New York : Springer, 2021. 607 p. DOI: https://doi.org/10.1007/978-1-0716-1418-1





