INTEGRATING NLP LIBRARIES AND PRE-TRAINED MODELS FOR AUTOMATED TEXT COMPLEXITY ASSESSMENT

Authors

DOI:

https://doi.org/10.35546/kntu2078-4481.2025.2.2.39

Keywords:

modular approach, natural language processing, text analysis, text complexity, text classification, pretrained model

Abstract

The rapid development of intelligent multi-agent systems has significantly influenced the field of Natural Language Processing (NLP), enabling more efficient and scalable text analysis methods. This study explores a modular approach to NLP, emphasising the integration of specialised libraries and models to enhance text processing capabilities. Modern NLP solutions streamline tasks like text complexity assessment by leveraging machine learning (ML) and deep learning techniques, particularly through pre-trained models and structured linguistic pipelines. Such integration allows for addressing not only syntactic and lexical aspects but also deeper semantic relationships within texts. The modular system proposed in this research allows for the seamless combination of various NLP components, ensuring adaptability to different analytical tasks. The study focuses on widely used NLP tools, including SpaCy for linguistic analysis and BERT (Bidirectional Encoder Representations from Transformers) for deep contextual understanding. The approach integrates traditional linguistic analysis with advanced neural network-based models to facilitate the assessment of text difficulty. The research presents a systematic approach to categorising texts of varying complexity, ranging from essential children’s stories to advanced legal documents.The research underscores the effectiveness of modular NLP systems in addressing the growing demand for automated text analysis. Combining structured linguistic features with deep-learning-based contextual embeddings enables accurate classification of text complexity, facilitating language learning and computational text processing applications. A key advantage of the modular approach is its flexibility and scalability, allowing researchers and developers to integrate customised NLP solutions for diverse applications. By adopting a modular approach, NLP continues to evolve, providing scalable and adaptable solutions for the ever-increasing challenges of text analysis.

References

Flayeh A. K., Hamodi Y. I., Zaki N. D. Text analysis based on natural language processing (NLP) // Proceedings of the 2nd International Conference on Advanced Engineering and Smart Technology (AEST). 2022. URL: https://doi.org/10.1109/AEST55805.2022.10413039 (date of access: 15.06.2025).

Qiu X., Sun T., Xu Y., Shao Y., Dai N., Huang X. Pre-trained models for natural language processing: A survey // Science China Technological Sciences. 2020. Vol. 63, No. 10. P. 1872–1897. DOI: https://doi.org/10.1007/s11431-020-1647-3 (date of access: 15.06.2025).

Shikun K., Montiel Olea J. L., Nesbit J. Robust machine learning algorithms for text analysis // Quantitative Economics. 2024. Vol. 15, No. 4. P. 939–970. DOI: https://doi.org/10.3982/QE1825 (date of access: 15.06.2025).

Parks L., Peters W. Natural language processing in mixed-methods text analysis: A workflow approach // International Journal of Social Research Methodology. 2022. Vol. 26, No. 1. P. 1–13. DOI: https://doi.org/10.1080/13645579.2021.2018905 (date of access: 15.06.2025).

Roldan-Baluis W. L., Zapata N. A., Mañaccasa Vásquez M. S. The effect of natural language processing on the analysis of unstructured text: A systematic review // International Journal of Advanced Computer Science and Applications. 2022. Vol. 13, No. 5. P. 43–51. DOI: https://doi.org/10.14569/IJACSA.2022.0130507 (date of access: 15.06.2025).

Benício D. H. P., Júnior J. C. X., Paiva K. R. S., de Camargo J. D. A. S. Applying text mining and natural language processing to electronic medical records for extracting and transforming texts into structured data // Research, Society and Development. 2022. Vol. 11, No. 6. P. e37711629184. DOI: https://doi.org/10.33448/rsd-v11i6.29184 (date of access: 15.06.2025).

Fate V. D., Deshmukh A. B., Watane H. N. Text analytics: An approach to artificial intelligence // Journal of Emerging Technologies and Innovative Research. 2021. Vol. 8, No. 7. P. f104–f111. URL: https://www.jetir.org/papers/JETIR2107644.pdf (date of access: 15.06.2025).

Copestake A. Natural Language Processing: Lectures. Cambridge, U.K.: University of Cambridge, Computer Laboratory, 2004. URL: https://www.cl.cam.ac.uk/teaching/2002/NatLangProc/revised.pdf (date of access: 15.06.2025).

Jurafsky D., Martin J. H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Upper Saddle River, NJ: Prentice Hall, 2008. URL: https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf (date of access: 15.06.2025).

Indurkhya N., Damerau F. J. Handbook of Natural Language Processing. 2nd ed. Boca Raton, FL: Chapman & Hall/CRC, 2010. DOI: https://doi.org/10.1201/9781420085938 (date of access: 15.06.2025).

Montesinos López O. A., Montesinos López A., Crossa J. Fundamentals of artificial neural networks and deep learning // In: Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer, 2022. DOI: https://doi.org/10.1007/978-3-030-89010-0_10 (date of access: 15.06.2025).

Pawłowski A., Walkowiak T. NLP for digital humanities: Processing chronological text corpora // Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities. 2024. P. 105–112. DOI: https://doi.org/10.18653/v1/2024.nlp4dh-1.10 (date of access: 15.06.2025).

Partalidou E., Spyromitros-Xioufis E., Doropoulos S., Vologiannidis S., Diamantaras K. Design and implementation of an open source Greek POS tagger and entity recogniser using spaCy // arXiv Preprint. 2019. DOI: https://doi.org/10.1145/3350546.3352543 (date of access: 15.06.2025).

Camacho-Collados J., Täckström O. Embeddings in natural language processing // Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts. 2020. P. 10–15. DOI: https://doi.org/10.18653/v1/2020.coling-tutorials.2 (date of access: 15.06.2025).

Paaß G., Giesselbach S. Foundation Models for Natural Language Processing: Pre-trained Language Models Integrating Media. Cham, Switzerland: Springer, 2023. DOI: https://doi.org/10.1007/978-3-031-23190-2 (date of access: 15.06.2025).

Downloads

Published

2025-06-05