АЛГОРИТМ КЛАСИФІКАЦІЇ ТЕКСТОВОГО КОНТЕНТУ СОЦІАЛЬНИХ МЕРЕЖ ДЛЯ ВИЗНАЧЕННЯ ЕМОЦІЙНОГО ТОНУ

N. І. BOYKO; V. YU. MYKHAILYSHYN

doi:10.35546/kntu2078-4481.2023.2.18

Authors

N. І. BOYKO https://orcid.org/0000-0002-6962-9363
V. YU. MYKHAILYSHYN https://orcid.org/0000-0003-1889-9053

DOI:

https://doi.org/10.35546/kntu2078-4481.2023.2.18

Keywords:

algorithm, emotional tone, content, classification, social network

Abstract

The article presents the results of the research and a comparison of the results of the application of the naive Bayes classifier using simple verbal signs and vector word models. The research methods and environment were analyzed, and a set of input data was determined. A classifier is trained on the selected dataset and its accuracy is evaluated using the classify.accuracy function from the nltk library. The classifier was also checked on its own text and the correctness of the classification was determined. A histogram was constructed that visually showed the number of correctly classified positive and negative examples. A confusion matrix was derived, which made it possible to evaluate the classification accuracy for each class. In the experimental part, the vector word model Word2Vec from the gensim library was used. The classifier was trained and its accuracy was evaluated. A significant increase in accuracy is achieved compared to the simplified approach. The paper discussed the issue of using vector models of words to improve text classification results. They allow you to better take into account the semantics and context of the text, which leads to more accurate results. It was analyzed that the accuracy of the classification depends on the data set, the features of the texts, and the data processing methods used. Research provides an optimal choice of methods and an approach to classification, which must take into account the specific task and context of the application. The work considered vector models of words and the use of more complex classification models. Factors affecting the emotional state of the text are given. The model parameters were optimized to achieve better results. As a result of the conducted experiments, the effectiveness of the naive Bayesian classifier and vector word models in the task of classifying the emotional state of the text was confirmed.

References

Pang B., Lee L., Vaithyanathan S. Thumbs up: sentiment classification using machine learning techniques. Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing. Vol. 10. Association for Computational Linguistics. 2002. Pp. 321–342.

Maas A.L., Daly R.E., Pham P.T., Huang D., Ng A.Y., Potts C. Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics. ACL 2011. 2011. Pp. 23–36.

Rennie J.D. Tackling the poor assumptions of naive bayes text classifiers. Machine Learning-International Workshop then Conference. 2003.Vol. 20(2). Pp. 56–62.

Tseng C., Patel N., Paranjape H., Lin T. Y., Teoh S. Classifying twitter data with naive bayes classifier. IEEE International Conference on Granular Computing. 2012. Pp. 89–101.

Estivill-Castro V., Lee I. Amoeba: Hierarchical clustering based on spatial proximity using Delaunay diagram. 9th Intern. Symp. on spatial data handling, Beijing, China. 2000. Pp. 26–41.

Guo D., Peuquet D.J., Gahegan M. ICEAGE: Interactive clustering and exploration of large and high-dimensional geodata. Geoinformatica. 2003. Vol. 3. N. 7. Pp. 229–253.

Harel D., Koren Y. Clustering spatial data using random walks. Proc. of the 7th ACM SIGKDD Intern. conf. on knowledge discovery and data mining, San Francisco, California. 2000. Pp. 281–286.

Boyko N., Pylypiv O., Peleshchak Yu., Kryvenchuk Yu., Campos J. Automated Document Analysis for Quick Personal Health Record Creation. The 2 nd International Workshop on Informatics & Data-Driven Medicine (IDDM 2019). Lviv, Ukraine, November 11-13. 2019. Vol. 1. Pp. 208–221.

Yakovyna V., Peleshchyshyn A., Albota S. Discussions of wikipedia talk pages: Manipulations detected by lingualpsychological analysis, CEUR Workshop Proceedings. 2019. Vol. 2392. Pp. 309–320.

ALGORITHM FOR CLASSIFICATION OF TEXT CONTENT OF SOCIAL NETWORKS FOR DETERMINING EMOTIONAL TONE

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Language

logo