COMPARATIVE REVIEW OF LANGUAGE MODELS OF ARTIFICIAL INTELLIGENCE

Authors

DOI:

https://doi.org/10.35546/kntu2078-4481.2025.2.2.1

Keywords:

artificial intelligence language models, language model, artificial intelligence, analysis

Abstract

Currently, artificial intelligence is spreading across various areas of human activity and is becoming more accessible to a wide range of users. In particular, various language models of artificial intelligence have become available to a wide range of users. In this regard, the current task is to determine a language model of artificial intelligence that will be convenient to use and will allow the user to solve his task at the proper level. This work considers language models of artificial intelligence. The following artificial intelligence language models are identified that are freely available and with which the user can interact via a chat bot: ChatGPT-4 Turbo (OpenAI), Copilot on GPT-4 (OpenAI), Gemini (Google), Claude (Anthropic), Jurassic-2 (AI21 Labs), DeepSeek (Hangzhou DeepSeek Artificial Intelligence Co., Ltd.).Among the studied language models of artificial intelligence, Claude, Jurassic-2, DeepSeek do not have access to search for information on the Internet, which makes it impossible to search for relevant current information. Among the studied AI language models that have access to searching for information on the Internet (ChatGPT-4 Turbo, Copilot on GPT-4, Gemini), only ChatGPT-4 Turbo did not have completely incorrect answers. However, it should be noted that it, like the other studied AI language models, was unable to give a completely correct answer to the prompt “Write a complete list of networks included in the international standard IEC 61158”. It correctly indicated only 10 of the 18 groups of networks that are in the standard, and incorrectly indicated 3 networks that are actually not in the standard (in this task, it was the fourth out of six in terms of correctness of the answer). It should be noted that for prompts that did not require access to the Internet, the most relevant answers were obtained from the AI language model Claude (Anthropic). It should also be noted that Claude (Anthropic), like other studied language models of artificial intelligence, was unable to give a completely correct answer to the prompt “Write a complete list of networks included in the international standard IEC 61158”. However, he correctly indicated 15 out of 18 groups of networks that are in the standard, and incorrectly indicated only 1 network that is not actually in the standard (in this task, he was the first out of six in terms of correctness of the answer). A comparative analysis of language models of artificial intelligence showed that there is no ideal one that provides the best or always correct answers in different tasks. But it can be noted that the model that provided the most relevant answers is ChatGPT-4 Turbo (OpenAI). The results of the research give reason to argue that any results obtained using existing language models of artificial intelligence require additional verification.

References

Zhao F. F., He H. J., Liang J. J. Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3 // Eye (London, England). 2024. № 39(6). DOI: 10.1038/s41433-024-03545-9

Silhadi M., Nassrallah W., Mikhail D., Harissi-Dagher M. Assessing the performance of Microsoft Copilot, GPT-4 and Google Gemini in ophthalmology // Canadian Journal of Ophthalmology. 2025. DOI: 10.1016/j.jcjo.2025.01.001

Kokala A., Kalluri K. Performance benchmarking of generative AI models: ChatGPT-4 vs. Google Gemini AI // International Research Journal of Modernization in Engineering Technology and Science. 2024. № 06(11). С. 4673–4677. DOI: 10.56726/IRJMETS64283

Bayer H., Araci F., Gürkan G. ChatGPT-4o, ChatGPT-4 and Google Gemini are compared with Students: A Study in Higher Education // International Journal of Technology in Education and Science. 2024. № 8(4). С. 627–644. DOI: 10.46328/ijtes.585.

Silva F. Navigating the dual-edged sword of generative AI in cybersecurity // Brazilian Journal of Development. 2025. № 11(1). С. 869. DOI: 10.34117/bjdv11n1-062

Jedrzejczak W., Kochanek K. Comparison of audiological knowledge in Polish language of three chatbots: ChatGPT, Bing Chat and Bard // Nowa Audiofonologia. 2025. № 13(4). С. 29–47. DOI: 10.17431/na/195982

Renshaw A., Lourentzou I., LeeShow J., Kim J. Comparing the Spatial Querying Capacity of Large Language Models: OpenAI’s ChatGPT and Google’s Gemini Pro // The Professional Geographer. 2025. № 77(2). С. 1–13. DOI: 10.1080/00330124.2024.2434455

Shrijal R. Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison [Електронний ресурс]. 2025. Режим доступу: https://composio.dev/blog/gemini-2-5-pro-vs-claude-3-7-sonnet-coding-comparison/

Nababuddin M. Calude or Jurassic 2 Foundation Model for Text Generation [Електронний ресурс]. 2024. Режим доступу: https://medium.com/@nabab.dev/calude-or-jurassic-2-foundation-model-for-text-generation-7eff326c16f2

Ji X. Best alternatives for deepseek R1 vs claude 3.5 sonnet in 2025 [Електронний ресурс]. 2025. Режим доступу: https://www.byteplus.com/en/topic/384781?title=best-alternatives-for-deepseek-r1-vs-claude-3-5-sonnet-in-2025

Бабчук І. С. Використання Gemini, Chatgpt, Copilot для пошуку інформації про спеціалізовані цифрові мережі // Матеріали V Всеукраїнської студентської наукової конференції «Науковий простір: аналіз, сучасний стан, тренди та перспективи», м. Київ, 17 травня 2024 р. Вінниця : ТОВ «УКРЛОГОС Груп», 2024. С. 341–343.

Бабчук І. С. Порівняльний аналіз результатів взаємодії з ChatGPT та Bing AI Chat // Матеріали IV Міжнародної студентської наукової конференції «Концепт науки XXI: стратегії, методи та наукові інструменти», м. Вінниця, 3 листопада 2023 р. Вінниця : ТОВ «УКРЛОГОС Груп», 2023. С. 155–157.

Published

2025-06-05