DESIGNING LOSS FUNCTIONS FOR TRANSFORMER-BASED TIME ESTIMATION IN AI PROJECT

Authors

DOI:

https://doi.org/10.32782/mathematical-modelling/2026-9-1-13

Keywords:

AI project forecasting, project time estimation, loss function design, transformer model, machine learning, uncertainty modeling, domain-weighted loss

Abstract

Accurate time estimation is critical for planning and managing artificial intelligence (AI) projects. However, traditional approaches often fall short due to the domain-specific nature of the task, causing high variability and complexity in such projects. This study dives into how transformer-based models can be improved for project duration prediction by designing a custom loss function. A dataset containing structured project metadata – such as domain, stage, and difficulty – was used to train a transformer model using multiple loss functions. These include standard Mean Squared Error (MSE), a domain-weighted loss with custom metadata mapping, and an uncertainty-aware loss. The final design of the loss function integrates all benefits of having a domain knowledge baseline and the structure of the uncertaintybased loss model. Evaluation results demonstrate that this combined approach outperforms standard loss functions based on MAE, MSE, and RMSE metrics. This study addresses the challenge of improving time estimation accuracy in artificial intelligence (AI) projects by focusing on designing custom loss functions for transformer-based regression models. Traditional estimation approaches often fail to capture the complexity and variability inherent in AI development processes, where project duration is influenced by factors such as high differences in domains and sudomains, targeted development stages, or just project difficulty. To overcome these limitations, this research looks into how loss function design can be used for better model performance. A structured dataset of AI projects with metadata-based features like primary domain, additional domain, project stage, and difficulty score. Based on this dataset, a transformer-based model was trained and evaluated using multiple loss function strategies. These include the standard mean squared error (MSE), a domain-weighted MSE that uses and relies on expert-driven metadata mappings, an uncertainty-aware loss function, and a final stabilised loss that combines domain weighting with probabilistic constraints. Experimental results demonstrate that the final design of a custom loss function outperforms traditional approaches across standard accuracy metrics, including mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE). The findings confirm that incorporating domain knowledge and uncertainty modeling directly into the loss function leads to more accurate, stable, and interpretable predictions. Overall, this study highlights the importance of loss function design in applied machine learning tasks and sets a framework for improving transformer-based time estimation models in AI project management contexts.

References

Bishop C. M. Pattern recognition and machine learning. Springer, 2006. URL: https://link.springer.com/book/10.1007/978-0-387-45528-0 (дата звернення: 08.03.2026).

Géron A. Hands-on machine learning with scikit-learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. 2nd ed. O’Reilly Media, 2019. URL: https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/ (дата звернення: 08.03.2026).

Gneiting T., Raftery A. E. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association. 2007. Vol. 102, № 477. P. 359–378. DOI: https://doi.org/10.1198/016214506000001437

Khoshgoftaar T. M., Seliya N. Comparative assessment of software quality classification techniques: An empirical case study. Empirical Software Engineering. 2004. Vol. 9, № 3. P. 229–257. DOI: https://doi.org/10.1023/B:EMSE.0000027781.18360.9B

Jørgensen M., Shepperd M. A systematic review of software development cost estimation studies. IEEE Transactions on Software Engineering. 2007. Vol. 33, № 1. P. 33–53. DOI: https://doi.org/10.1109/TSE.2007.3

Kitchenham B. A., Pfleeger S. L., Pickard L. M., Jones P. W., Hoaglin D. C., El Emam K., Rosenberg J. Preliminary guidelines for empirical research in software engineering. IEEE Transactions on Software Engineering. 2002. Vol. 28, № 8. P. 721–734. DOI: https://doi.org/10.1109/TSE.2002.1027796

Menzies T., Zimmermann T. Software analytics: So what? IEEE Software. 2013. Vol. 30, № 4. P. 31–37. DOI: https://doi.org/10.1109/MS.2013.58 (дата звернення: 08.03.2026).

Thai-Nghe N., Gantner Z., Schmidt-Thieme L. Cost-sensitive learning methods for imbalanced data. Proceedings of the International Joint Conference on Neural Networks. 2010. P. 1–8. DOI: https://doi.org/10.1109/IJCNN.2010.5596486

Elkan C. The foundations of cost-sensitive learning. Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2001. P. 973–978. URL: https://cseweb.ucsd.edu/~elkan/rescale.pdf (дата звернення: 28.02.2026).

He H., Garcia E. A. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering. 2009. Vol. 21, № 9. P. 1263–1284. DOI: https://doi.org/10.1109/TKDE.2008.239

Kendall A., Gal Y. What uncertainties do we need in Bayesian deep learning for computer vision? Advances in Neural Information Processing Systems. 2017. Vol. 30. P. 5574–5584. DOI: https://doi.org/10.48550/arXiv.1703.04977

Lakshminarayanan B., Pritzel A., Blundell C. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in Neural Information Processing Systems. 2017. Vol. 30. P. 6402–6413. DOI: https://doi.org/10.48550/arXiv.1612.01474

Xia F., Liu Y., Wang Y., Liu Y. Multi-granularity uncertainty modeling for text classification. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. P. 1056–1065. DOI: https://doi.org/10.18653/v1/P19-1101

Kachmar P. Comparative analysis of model performance for time estimations in AI projects. SCIENTIA: Collection of Scientific Papers. 2025. P. 207–212. URL: https://previous.scientia.report/index.php/archive/article/view/2300 (дата звернення: 18.03.2026).

Kachmar P. Statistical analysis of time estimation patterns in AI project timelines. SCIENTIA: Collection of Scientific Papers. 2024. P. 147–150. URL: https://previous.scientia.report/index.php/archive/article/view/2230 (дата звернення: 18.03.2026).

Downloads

Published

2026-07-01