ADEQUACY OF THE PROCESS OF BUILDING THE SEMANTIC MODEL OF THE DOCUMENT BASED ON AN UNSTRUCTURED KNOWLEDGE BASE

Authors

  • Y.R. KOVULIN

DOI:

https://doi.org/10.32782/mathematical-modelling/2022-5-1-4

Keywords:

semantic network, automatic text processing, request-response system, text generation

Abstract

The development of applied software systems for automatic text processing implies the choice of one or the other mechanism for describing and implementing a natural language model available for computer processing. Since language is a rather unformalized system with instability and heterogeneity of its own rules, the main problem in the implementation of such models is the difficulty of describing the semantic characteristics of the text at the level of algorithmic representation. An approach to building a programmatic semantic model of a document, which is based on the structure of a hybrid semantic network, was implemented in the dissertation [1]. The necessity and importance of this model comes primarily from the analysis of existing analogues and algorithmic approaches to the construction of semantic networks of a document. All of them are either based on dictionaries or have not been developed for language groups with a rich inflectional morphology. The developed approach is based on the algorithm of latent semantic analysis, which allows finding semantic correspondences based on weight characteristics of the text and working with coordinate projections for basic text units on a two-dimensional plane. The use of such an approach to work with the semantic characteristics of the text is innovative not only because the algorithm combines many specific additional stages that are atypical for approaches to building a semantic model of a document, but also because the scope of application of latent semantic analysis primarily concerns the tasks of document classification, while in our model, its use has been changed, and we approach not the document to the term, but the sentence from documents to document terms. It is a question of the use of clustering algorithms with appropriate methods for determining the necessary parameters for it, algorithms of syntactic, morphological and spatial data analysis [1]. The obtained approach makes it possible to build semantic models of scientific texts without any previous semantic marking or compilation of semantic dictionaries, which contain quantitative indicators of the semantic characteristics of the text, which greatly simplifies the process of building automatic text processing systems. The research carried out in this article concerns the verification of the model's ability to fulfill its functional purpose by constructing its adequacy criteria and the verification of their implementation by conducting relevant experiments.

References

Ковилін Є.Р. Модель генерації відповідей в пошукових системах на основі неструктурованої бази знань : дис ... канд. техн. наук : 01.05.02. Національна металургійна академія України. Дніпро, 2020. 233 с.

Усталов Д.А. Моделі, методи та алгоритми побудови семантичної мережі слів для задач обробки природної мови : дис … канд. фіз.-мат.наук : 05.13.17. ФГБУМ. Челябінськ, 2017. 129 с.

Болдас М.В., Соколова Є.Г. Генерація текстів на природній мові – теорії, методи, технології. НТІ. Сер. 2. Інформаційні процеси і системи, 2006. С.1-15.

Генератор тексту. URL: https://online-generators.ru/text

Volkovsky O.S., Kovylin Y. R. Computer system of intellectual semantic search with the text generation using. Bulletin of the Kherson National University. 2018. №3 (66). P. 238-245.

Published

2023-05-30