COMPARATIVE ANALYSIS OF SEMANTIC REPRESENTATION METHODS FOR NATURAL LANGUAGE TEXTS IN THE TASK OF PARAPHRASE IDENTIFICATION
DOI:
https://doi.org/10.32782/mathematical-modelling/2025-8-2-22Keywords:
cognitive modeling, paraphrases, comprehension, natural language, AMR, semantic graphs, trans- formers, artificial intelligenceAbstract
The article addresses the problem of identifying semantic similarity of natural language texts, which is crucial for modeling the cognitive process of comprehension and developing intelligent language processing systems. Text comprehension involves not only lexical and syntactic analysis but also deep cognitive interpretation of semantic relations and contextual features. A particular challenge is the variability of linguistic constructions, where the same information is conveyed in different ways while preserving the identity of meanings. In this context, the task of paraphrase identification– expressions with similar meaning but different forms–is important for cognitive models. The human brain can easily detect semantic similarity; however, creating computer models with similar capabilities is a complex task due to ambiguity, context dependence, and multilingualism of natural language. The aim of the article is to substantiate a method for searching semantic similarity of natural language texts through analysis of modern approaches to building semantic representations in the context of solving the paraphrase identification task. The paraphrase identification task is chosen because it is representative for studying semantic similarity and cognitive mechanisms of meaning recognition. The article analyzes three main approaches to solving the task: statistical methods based on corpus analysis, knowledge-based methods using ontologies and semantic networks, and deep learning methods based on distributional semantics and transformer architectures. Hybrid approaches are considered separately, which combine graph-based semantic representations AMR with neural network models, allowing structuring of text content at the conceptual level and modeling logical-semantic connections between its components. Results of a comparative analysis of recent studies confirm the high accuracy and ability to generalize semantic information inherent in AMR-based approaches compared to other methods. This indicates the appropriateness of applying AMR as a foundation for building intelligent systems capable of modeling the cognitive processes of human understanding of natural language text. The practical value of the research lies in forming a basis for further development of cognitive technologies in machine translation, automatic summarization, information retrieval, and dialogue systems, significantly improving the quality of human-computer interaction based on natural language.
References
Landauer T. K., Foltz P. W., Laham D. An introduction to latent semantic analysis. Discourse Processes. 1998. Vol. 25, no. 2–3. P. 259–284. https://doi.org/10.1080/01638539809545028.
Banarescu L., Bonial C., Cai S., Georgescu M., Griffitt K., Hermjakob U., Knight K., Koehn P., Palmer M., Schneider N. Abstract Meaning Representation for Sembanking. In: Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse. Sofia, Bulgaria; 2013. P. 178–186. Association for Computational Linguistics. URL: https://aclanthology.org/W13-2102.pdf
Dohare S., Karnick H., Gupta V. Text Summarization using Abstract Meaning Representation. arXiv preprint. 2017. https://doi.org/10.48550/arXiv.1706.01678
Yingxu Wang et al. A layered reference model of the brain (LRMB). IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews). 2006. Vol. 36, no. 2. P. 124–133. https://doi.org/10.1109/tsmcc.2006.871126.
Corley C., Mihalcea R. Measuring the Semantic Similarity of Texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Ann Arbor, Michigan; 2005. P. 13–18. Association for Computational Linguistics. URL: https://aclanthology.org/W05-1203.pdf.
Mihalcea R., Corley C., Strapparava C. Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In: Proceedings of the 21st National Conference on Artificial Intelligence (AAAI). Boston, MA; 2006. P. 775–780. Association for the Advancement of Artificial Intelligence. URL: https://aaai.org/Papers/AAAI/2006/AAAI06-123.pdf.
Shen D., Wang G., Wang W., Min M. R., Su Q., Zhang Y., Li C., Henao R., Carin L. Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia; 2018. P. 440–450. Association for Computational Linguistics. https://doi.org/10.18653/v1/P18-1041.
Cho K., van Merrienboer B., Gulcehre Ç., Bougares F., Schwenk H., Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR. 2014. URL: http://arxiv.org/abs/1406.1078.
Hu B., Lu Z., Li H., Chen Q. Convolutional Neural Network Architectures for Matching Natural Language Sentences. arXiv preprint. 2015. URL: https://arxiv.org/pdf/1503.03244.
Issa F., Damonte M., Cohen S. B., Yan X., Chang Y. Abstract Meaning Representation for Paraphrase Detection. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana; 2018. P. 442–452. Association for Computational Linguistics. URL: https://aclanthology.org/N18-1041.pdf.







