ENSEMBLE LEARNING STRATEGIES FOR STUDENT PERFORMANCE PREDICTION: A COMPARATIVE AND INTERACTIVE FRAMEWORK

Authors

DOI:

https://doi.org/10.32782/mathematical-modelling/2026-9-1-30

Keywords:

Machine learning, Ensemble methods, Classification, Regression, Stacking, Voting, Bagging, Boosting, Random Forest, XGBoost, Data analysis

Abstract

Ensemble methods in machine learning are among the most effective approaches for building intelligent data analysis systems, as they improve the accuracy, robustness, and generalization ability of models in classification and regression tasks. The relevance of this research is due to the fact that modern applied datasets are often characterized by high feature dimensionality, noise, missing values, outliers, and class imbalance, which complicate the use of individual machine learning algorithms and reduce the reliability of predictions. The article examines the theoretical and practical foundations of ensemble approaches, in particular bagging, boosting, voting, and stacking, with an emphasis on their role in improving modeling quality and reducing the risk of overfitting. Special attention is paid to stacking as a flexible strategy for integrating heterogeneous base models by means of a meta-model. The practical part of the work is devoted to the development of an interactive web interface for studying ensemble methods using the task of student performance assessment as an example. Within the proposed approach, two modeling strategies are implemented: direct multiclass classification on a discretized target variable and regression followed by categorization of the predicted values. Random Forest, XGBoost, Ridge Regression, as well as Voting and Stacking ensemble schemes were used to build the models. The quality of classification models was evaluated using Accuracy, Precision, Recall, F1-score, and Balanced Accuracy metrics, while regression models were assessed using MAE, MSE, RMSE, and R2. The work also takes into account the stages of data preprocessing, cross-validation, and hyperparameter optimization, which made it possible to improve the reproducibility and reliability of the results. The developed web interface provides step-by-step data loading, configuration of preprocessing parameters, model training, metric analysis, and result visualization, which contributes to the transparency of experimentation and the convenience of comparing different ensemble strategies. The proposed approach is practically suitable for educational and research tasks and can be used as a decision-support tool in forecasting problems based on tabular data.

References

Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer, 2009. DOI: https://doi.org/10.1007/978-0-387-84858-7

Ghojogh B., Crowley M. The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial. arXiv, 2019. URL: arXiv:1905.12787

Zhou Z.-H. Ensemble Methods: Foundations and Algorithms. 2nd ed. Chapman and Hall/CRC, 2025. DOI: https://doi.org/10.1201/9781003587774

Rokach L. Ensemble-based classifiers. Artificial Intelligence Review. 2010. Vol. 33, no. 1–2. P. 1–39. DOI: https://doi.org/10.1007/s10462-009-9124-7

Dietterich T. G. Ensemble methods in machine learning. In: Multiple Classifier Systems. Springer, 2000. P. 1–15. DOI: https://doi.org/10.1007/3-540-45014-9_1

Kuncheva L. I. Combining Pattern Classifiers: Methods and Algorithms. 2nd ed. John Wiley & Sons, 2014.

Breiman L. Bagging predictors. Machine Learning. 1996. Vol. 24, no. 2. P. 123–140. DOI: https://doi.org/10.1023/A:1018054314350

Breiman L. Random forests. Machine Learning. 2001. Vol. 45, no. 1. P. 5–32. DOI: https://doi.org/10.1023/A:1010933404324

Wolpert D. H. Stacked generalization. Neural Networks. 1992. Vol. 5, no. 2. P. 241–259. DOI: https://doi.org/10.1016/S0893-6080(05)80023-1

Sill J., Takács G., Mackey L., Lin D. Feature-Weighted Linear Stacking. arXiv, 2009. URL: arXiv:0911.0460

Shindo J. H., Mjahidi M. M., Waziri M. D. Data mining algorithms for prediction of student teachers’ performance in ICT: A systematic literature review. Information Technologies and Learning Tools. 2023. Vol. 96, no. 4. P. 29–45. DOI: https://doi.org/10.33407/itlt.v96i4.5246

Caprian I. Impact of false alarms in machine learning-based anti-fraud systems: The economic and reputational consequences. Business Inform. 2025. No. 8. P. 378–389. DOI: https://doi.org/10.32983/2222-4459-2025-8-378-389

Published

2026-07-01