MULTI-MODEL APPROACH TO SYNTHETIC DATA GENERATION FOR SEMANTIC SIMILARITY ASSESSMENT OF WEB INTERFACE TEXT LABELS

B. O. KUZIKOV; O. V. VLASENKO; O. V. SHUTYLIEVA; O. A. SHOVKOPLYAS; S. R. SHOVKOPLIAS; P. O. TYTOV

doi:10.35546/kntu2078-4481.2025.4.2.12

Authors

B. O. KUZIKOV Sumy State University https://orcid.org/0000-0002-9511-5665
O. V. VLASENKO Sumy State University https://orcid.org/0000-0003-4315-5654
O. V. SHUTYLIEVA Science Sumy State University https://orcid.org/0000-0002-7236-8555
O. A. SHOVKOPLYAS Science Sumy State University https://orcid.org/0000-0002-4596-2524
S. R. SHOVKOPLIAS Sumy State University https://orcid.org/0000-0003-1837-0213
P. O. TYTOV Sumy State University https://orcid.org/0009-0003-6911-5463

DOI:

https://doi.org/10.35546/kntu2078-4481.2025.4.2.12

Keywords:

large language models, web accessibility, multi-model approach, semantic similarity, statistical consensus, synthetic training data, dataset validation.

Abstract

Automating web accessibility verification against WCAG standards remains a significant challenge, particularly for criteria requiring semantic understanding of content. WCAG Success Criterion 2.5.3 requires consistency between the visible text of interface elements and their accessible names; however, traditional string matching methods fail to account for semantic nuances of textual modifications. Objective. To develop a methodology for creating and validating a robust dataset for assessing semantic correspondence of text labels in the absence of an objective ground truth. This includes developing a taxonomy of semantic transformations, applying a multi-model approach to generation and annotation of synthetic data in Ukrainian and English languages. Methods. The study employs a multi-model approach involving four leading LLMs to generate over 14 thousand unique text pairs and 17 models for data annotation. A taxonomy of semantic transformations was developed that classifies modification types from permissible contextual extensions to critical contradictions. For validation, a statistical framework for consensus formation based on the median scores of a reference core of models was applied, utilizing ICC2k metrics, coefficients of determination, and Spearman correlation. Results. A validated dataset was created with synthetic data annotated on a semantic similarity scale from -1 to 1. The multi-model approach ensured dataset diversity and minimized biases of individual models. Conclusions. The developed methodology effectively addresses the problem of creating training data without an objective ground truth. The formed dataset enables objective comparison of commercial LLMs and cost-effective knowledge distillation into small models for practical application in automated accessibility testing tools.

References

Automated WCAG Testing Is Not Enough for Web Accessibility & ADA Compliance. UsableNet. URL: https://blog.usablenet.com/automated-wcag-testing-is-not-enough-for-web-accessibility-ada-compliance (date of access: Nov. 12, 2025).

Sane P. A Brief Survey of Current Software Engineering Practices in Continuous Integration and Automated Accessibility Testing, 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, 2021, pp. 130-134. DOI: https://doi.org/10.1109/WiSPNET51692.2021.9419464

Web Content Accessibility Guidelines (WCAG) 2.1. URL: https://www.w3.org/TR/WCAG21/. (date of access: Nov. 12, 2025).

Huq S.F., Tafreshipour M., Kalcevich K., Malek S. Automated Generation of Accessibility Test Reports from Recorded User Transcripts, 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), Ottawa, ON, Canada, 2025, pp. 204-216. DOI: https://doi.org/10.1109/ICSE55347.2025.00043.

Delnevo G., Andruccioli M., Mirri S. On the Interaction with Large Language Models for Web Accessibility: Implications and Challenges, 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 2024, pp. 1-6. DOI: https://doi.org/10.1109/CCNC51664.2024.10454680.

Aralimatti R., Shakhadri S.A.G., Kr K, Angadi K. Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective, Preprints, Feb. 2025. URL: https://doi.org/10.20944/preprints202502.2128.v1 (date of access Nov. 12, 2025).

Gu J., Jiang X., Shi Z., Tan H., Zhai X., Xu C., Li W., Shen Y., Ma S., Liu H., Wang S., Zhang K., Wang Y., Gao W., Ni L., Guo, J. A survey on LLM-as-a-Judge. URL: https://doi.org/10.48550/arXiv.2411.15594 (date of access Nov. 12, 2025).

Kuzikov B., Shovkoplias O., Tytov P., Shovkoplias S., Shutylieva O., Vlasenko O. A Statistical Framework for Consensus-Based Reliability Assessment in Large Language Model Evaluation Applied to Web Accessibility, 5th International Conference on Machine Learning and Big Data Analytics (ICMLBDA) 2025. ICS Global, November 7-8, 2025. https://doi.org/DOI: 10.21203/rs.3.rs-8093408/v1

Kuzikov B., Tytov P., Shovkoplias O., Lavryk T., Koval V., Kuzikova S., Detection And Prevention Of Accessibility Cloaking Attacks, Information Technology Computer Science Software Engineering and Cyber Security, № 1, p. 124–135. DOI: https://doi.org/10.32782/it/2025-1-17

Tytov P.O. Semantic Language Models for WCAG. URL: https://www.kaggle.com/datasets/tytovpavel/semanticlanguage-models-for-wcag. (date of access Nov. 12, 2025)

Kuzikov B.O., Shovkoplias O.A., Tytov P.O., Shovkoplias S.R. Application of small language models for semantic analysis of web interface accessibility, Проблеми програмування, №. 2, С. 77–86, 2025. https://doi.org/10.15407/pp2025.02.077

MULTI-MODEL APPROACH TO SYNTHETIC DATA GENERATION FOR SEMANTIC SIMILARITY ASSESSMENT OF WEB INTERFACE TEXT LABELS

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

Language

logo