SURVEY ON AI IN THE FUNCTIONAL SOFTWARE TESTING
DOI:
https://doi.org/10.35546/kntu2078-4481.2025.2.2.37Keywords:
Survey, Artificial intelligence, Machine learning, Test Case Generation, Test Automation, Defect Prediction, Test Selection, Test Prioritization, Test Execution OptimizationAbstract
The survey conducts a systematic review of the application of artificial intelligence (AI) in software quality control and testing, covering both functional and non-functional quality aspects across all testing stages. The study analyzes various AI techniques, including machine learning, deep learning, evolutionary algorithms, and natural language processing, while also examining AI-based testing tools. It highlights advancements, challenges, and research gaps, drawing on scientific publications from the last decade, supplemented by key foundational works.Functional software quality is defined by its compliance with requirements and correct execution of functions, verified through unit, integration, system, and acceptance testing. AI facilitates automation of test case generation, defect prediction, and oracle design. Non-functional quality encompasses attributes such as performance, reliability, security, and usability, which are more challenging to automate. The review emphasizes AI’s role in automating testing, particularly regression testing, to ensure product stability after changes.Key AI applications include test case generation and test automation using search-based algorithms (e.g., EvoSuite, Sapienz), deep learning, and large language models (LLMs) like Codex. AI is also utilized for defect prediction, code quality assessment, test selection, prioritization, and optimization of test execution in continuous integration environments.Tools like Diffblue Cover demonstrate industrial adoption of AI for generating compact test suites.Major challenges include the oracle problem (determining correct test outcomes), data dependency, the need for model transparency, and limited applicability in safety-critical domains. Future research directions include automating oracles, integrating AI with DevOps, developing human-in-the-loop hybrid systems, and establishing standards for AI in testing.The review calls for unified datasets and benchmarks to evaluate AI testing techniques, particularly in safety-critical sectors like healthcare and automotive.
References
BrowserStack: Functional testing: Definition, types & examples (January 2025). URL: https://www.browserstack.com/guide/functional-testing
BrowserStack: What is non-functional testing? Definition, types, and tools (January 2025). URL: https://www.browserstack.com/guide/what-is-non-functional-testing
The Apache Software Foundation: Apache jmeter (2025). URL: https://jmeter.apache.org/
OWASP: Free for open source application security tools (2025). URL: https://owasp.org/www-community/Free_for_Open_Source_Application_Security_Tools
Jia, Y., Harman, M.: An analysis and survey of the development of mutation testing. IEEE Transactions on Software Engineering 37(5), 649–678 (2010). doi: 10.1109/TSE.2010.62
Alshayeb, M.: Empirical investigation of refactoring effect on software quality. International Journal of Electrical and Computer Engineering (IJECE) 13(2), 1823–1832 (2023). doi: 10.11591/ijece.v13i2.pp1823-1832
Harman, M., McMinn, P.: A theoretical and empirical study of search- based testing: Local, global, and hybrid search. IEEE Transactions on Software Engineering 36(2), 226–247 (2009). doi: 10.1109/TSE.2009.71
Ricca, F., Marchetto, A., Stocco, A.: Ai-based test automation: A grey literature analysis. 2021 IEEE International Conference on Software Testing, Verification and Validation Work- shops pp. 263–270 (2021). doi: 10.1109/ICSTW52544.2021.00051
Politowski, C., Guéhéneuc, Y., Petrillo, F.: Towards automated video game testing. Proceedings of the 1st International Workshop on Search-Based Software Engineering pp. 57–64 (2022). doi: 10.1145/3524494.3527627
Dobrovolskyi, H., Keberle, N.: Obtaining the minimal terminologically saturated document set with controlled snowball sampling. In: CEUR Workshop Proceedings. vol. 2740, pp. 87–101 (2020).
Ali, S., Briand, L., Hemmati, H., Panesar-Walawege, R. K.: A systematic review of the application and empirical investigation of search-based test case generation. IEEE Transactions on Software Engineering 36(6), 742–762 (2009). doi: 10.1109/TSE.2009.52
Su, T., Meng, G., Chen, Y., Wu, K., Yang, W., Yao, Y., Pu, G., Liu, Y., Su, Z.: Guided, stochastic model-based gui testing of android apps. In: Proceedings of the 2017 10th Joint Meeting on Foundations of Software Engineering (ESEC/ FSE). pp. 245–256 (2017). doi: 10.1145/3106237.3106298
Liu, Z., Chen, C., Wang, J., Chen, M., Wu, B., Che, X., Wang, D., Wang, Q.: Make llm a testing expert: Bringing human-like interaction to mobile gui testing via functionality-aware decisions. Proceedings of the 46th International Conference on Software Engineering pp. 1–13 (2024). doi: 10.1145/3597503.3639180
Fraser, G., Wyrich, M.: State of the art in search based software testing: A report from the sbst’18 tool competition (2018).
Arcuschin, I. G., Rojas, J. M., Fraser, G., Campos, J.: Evosuite at the sbst 2020 tool competition. In: 2020 IEEE/ACM 13th International Workshop on Search-Based Software Testing (SBST). pp. 33–36. IEEE (2020). doi: 10.1145/3387940.3392186
Fraser, G., Zeller, A.: Mutation-driven generation of unit tests and oracles. Proceedings of the 19th International Symposium on Software Testing and Analysis pp. 147–158 (2010). doi: 10.1145/1831708.1831728
Dong, Z., Böhme, M., Cojocaru, L., Roychoudhury, A.: Time-travel testing of android apps. Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering pp. 687–698 (2020). doi: 10.1145/3377811.3380402
Alagarsamy, S., Tantithamthavorn, C., Takerngsaksiri, W., Arora, C., Aleti, A.: Enhancing large language models for text-to-testcase generation. arXiv preprint arXiv:2402.11910 (2024).
Li, Z., Harman, M., Hierons, R.M.: Search algorithms for regression test case prioritization. IEEE Transactions on Software Engineering 33(4), 225–237 (2007). doi: 10.1109/TSE.2007.38
Ellis, K., Wong, L., Nye, M., Sablé-Meyer, M., Morales, L., Hewitt, L., Solar-Lezama, A., Tenenbaum, J. B.: Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning (2023). doi: 10.48550/arXiv.2307.00404
Spichkova, M., Muxiddinova, N.: Automated analysis of the scrum agile process using process mining. In: Proceedings of the 15th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE). pp. 266–274. SCITEPRESS – Science and Technology Publications (2020). doi: 10.5220/0009417802660274
Diffblue: What is diffblue cover? (March2025). URL: https://www.diffblue.com/diffblue-cover/
Liu, P., Zhang, X., Pistoia, M., Zheng, Y., Marques, M., Zeng, L.: Automatic text input generation for mobile testing. 2017 IEEE/ACM 39th International Conference on Software Engineering pp. 643–653 (2017). doi: 10.1109/ICSE.2017.65
Gu, T., Cao, C., Liu, T., Sun, C.P., Deng, J., Ma, X., Lü, J.: Aimdroid: Activity-insulated multi-level automated testing for android applications. 2017 IEEE International Conference on Software Maintenance and Evolution pp. 103–114 (2017). doi: 10.1109/ICSME.2017.72
Liu, Z., Chen, C., Wang, J., Che, X., Huang, Y.K., Hu, J., Wang, Q.: Fill in the blank: Context-aware automated text input generation for mobile gui testing. 2023 IEEE/ACM 45th International Conference on Software Engineering pp. 1355–1367 (2023). doi: 10.1109/ICSE48619.2023.00119
Mao, K., Harman, M., Jia, Y.: Sapienz: Multi-objective automated testing for android applications. In: Proceedings of the 38th International Conference on Software Engineering (ICSE). pp. 352–362 (2016). doi: 10.1145/2931037.2931054
Barr, E.T., Harman, M., McMinn, P., Shahbaz, M., Yoo, S.: The oracle problem in software testing: A survey. IEEE Transactions on Software Engineering 41(5), 507–525 (2014). doi: 10.1109/TSE.2014.2372785
Fraser, G., Arcuri, A.: Evosuite. Proceedings of the 19th ACM SIG-SOFT Symposium and the 13th European Conference on Foundations of Software Engineering pp. 416–419 (2011). doi: 10.1145/2025113.2025179
Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing 27, 504–518 (2015). doi: 10.1016/j.asoc.2014.11.023
Alshammari, A., Morris, C., Hilton, M., Bell, J.: Flake- flagger: Predicting flakiness without rerunning tests. 2021 IEEE/ACM 43rd International Conference on Software Engineering pp. 1572–1584 (2021). doi: 10.1109/ICSE43902.2021.00140
Wang, Y., Jia, P., Liu, L., Huang, C., Liu, Z.: A systematic review of fuzzing based on machine learning techniques. PLOS ONE 15(8), e0237749 (2020). doi: 10.1371/journal.pone.0237749
Xie, Q., Memon, A.M.: Designing and comparing automated test oracles for guibased software applications. ACM Transactions on Software Engineering and Methodology 16(1), 4 (2007). doi: 10.1145/1189748.1189752
Basavegowda Ramu, V.: Performance testing using machine learning. SSRG International Journal of Computer Science and Engineering 10(6), 36–42 (2023). doi: 10.14445/23488387/IJCSE-V10I6P105
Li, Z., Harman, M., Hierons, R.M.: Search algorithms for regression test case prioritization. IEEE Transactions on Software Engineering 33(4), 225–237 (2007). doi: 10.1109/TSE.2007.38
Su, Q., Li, X., Ren, Y., Qiu, R., Hu, C., Yin, Y.: Attention transfer reinforcement learning for test case prioritization in continuous integration. Applied Sciences 15(4), 2243 (2025). doi: 10.3390/app15042243
Pinto, G., Miranda, B., Dissanayake, S., d’Amorim, M., Treude, C., Bertolino, A.: What is the vocabulary of flaky tests? Proceedings of the 17th International Conference on Mining Software Repositories pp. 492–502 (2020). doi: 10.1145/3379597.3387482
Chekam, T. T., Papadakis, M., Traon, Y. L., Harman, M.: An empirical study on mutation, statement and branch coverage fault revelation that avoids the unreliable clean program assumption. 2017 IEEE/ACM 39th International Conference on Software Engineering pp. 597–608 (2017). doi: 10.1109/ICSE.2017.61
Abdessalem, R. B., Nejati, S., Briand, L., Stifter, T.: Testing vision-based control systems using learnable evolutionary algorithms. 2018 IEEE/ACM 40th International Conference on Software Engineering pp. 1016–1026 (2018). doi: 10.1145/3180155.3180160
Zhang, X., Li, Y., Wang, Z.: Rtl regression test selection using machine learning. In: Proceedings of the 27th Asia and South Pacific Design Automation Conference (ASP-DAC). pp. 1–6 (2022). doi: 10.1109/ASP-DAC52403.2022.9712550
Mohammed, A. S., Boddapati, N., Mallikarjunaradhya, V., Jiwani, N., Sreeramulu, M. D., Natarajan, Y.: Optimizing real-time task scheduling in cloud-based ai systems using genetic algorithms. In: Proceedings of the 7th International Conference on Contemporary Computing and Informatics (IC3I). pp. 1649–1654 (2025). doi: 10.1109/ IC3I61595.2024.10829055, doi: 10.5220/0009417802660274
A multi-objective optimization design to generate surrogate machine learning models for predictive maintenance. Computers in Industry (2023). doi: https://www.sciencedirect.com/science/article/pii/S2193943823000134
Su, T., Meng, G., Chen, Y., Wu, K., Yang, W., Yao, Y., Pu, G., Liu, Y., Su, Z.: Guided, stochastic model-based gui testing of android apps. In: Proceedings of the 2017 10th Joint Meeting on Foundations of Software Engineering (ESEC/ FSE). pp. 245–256 (2017). doi: 10.1145/3106237.3106298
Leotta, M., Ricca, F., Marchetto, A., Olianas, D.: An empirical study to compare three web test automation approaches: Nlp and capturereplay. Journal of Software: Evolution and Process 35(9), e2606 (2023). doi: 10.1002/smr.2606
Harman, M., Hassoun, Y., Lakhotia, K., McMinn, P., Wegener, J.: The impact of input domain reduction on search-based test data generation. Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering pp. 155–164 (2007). doi: 10.1145/1287624.1287647
Su, T., Wang, J., Su, Z.: Benchmarking automated gui testing for android against real-world bugs. Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering pp. 119–130 (2021). doi: 10.1145/3468264.3468620
Chen, X., Hu, X., Yuan, H., Jiang, H., Ji, W., Jiang, Y., Jiang, Y., Liu, B., Liu, H., Li, X., Lian, X., Meng, G., Xin, P., Sun, H., Shi, L., Wang, B., Wang, C., Wang, J., Wang, T., Xuan, J., Xia, X., Yang, Y., Yang, Y., Li, Z., Zhou, Y., Zhang, L.: Deep learning-based software engineering: Progress, challenges, and opportunities. Science China Information Sciences 67(5), 150101 (2024). doi: 10.1007/s11432-023-4127-5
Choudhary, S.R., Gorla, A., Orso, A.: Automated test input generation for android: Are we there yet? arXiv preprint (2015). doi: 10.48550/arXiv.1503.07217
Harman, M., Hassoun, Y., Lakhotia, K., McMinn, P., Wegener, J.: The impact of input domain reduction on search-based test data generation. Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering pp. 155–164 (2007). doi: 10.1145/1287624.1287647
Adamsen, C.Q., Mezzetti, G., Møller, A.: Systematic execution of android test suites in adverse conditions. Proceedings of the 2015 International Symposium on Software Testing and Analysis pp. 83–93 (2015) doi: 10.1145/2771783.2771786
Harman, M., McMinn, P.: A theoretical empirical analysis of evolutionary testing and hill climbing for structural test data generation. Proceedings of the 2007 International Symposium on Software Testing and Analysis pp. 73–83 (2007). doi: 10.1145/1273463.1273475
Ji T., Hou Y., Zhang D. A comprehensive survey on Kolmogorov-Arnold Networks (KAN). arXiv preprint arXiv:2407.11075. 2024. DOI 10.48550/arXiv.2407.11075







