HYBRID MODEL FOR EVALUATING THE EFFECTIVENESS OF PROXIMITY METRICS FOR GENE EXPRESSION DATA
DOI:
https://doi.org/10.32782/mathematical-modelling/2025-8-1-28Keywords:
gene expression data, proximity metrics, hybrid model, clustering, classification, personalized medicineAbstract
This paper presents the development and application of a hybrid model for evaluating the effectiveness of prox- imity metrics in high-dimensional gene expression data, integrating data mining and machine learning methods within a unified comprehensive framework. The study focuses on a comparative analysis of correlation distance, mutual infor- mation-based metrics, and the Wasserstein metric to assess their efficiency for clustering and classification tasks. The obtained results demonstrate that correlation distance and the Wasserstein metric provide high accuracy and stability, making them suitable for integration into diagnostic systems. To enhance classification reliability, a stacking model has been implemented to compensate for potential clustering errors and ensure consistent performance regardless of the applied metric and cluster structure. The proposed data processing pipeline enables automated, standardized, and scala- ble analysis of large-scale gene expression datasets, aligning with the principles of personalized medicine. By facilitating early disease diagnosis and supporting the development of individualized treatment strategies, the findings contribute sig- nificantly to improving modern diagnostic systems within the framework of precision medicine. The application of hybrid approaches allows for an effective combination of the advantages of different similarity evaluation and classification methods, enhancing prediction accuracy based on gene expression data. The proposed methodological approach can be utilized in bioinformatics for analyzing complex biological systems, optimizing genetic data processing workflows, and developing new algorithmic solutions in medical diagnostics. This approach promotes the adaptation of modern information technologies for identifying disease biomarkers, ensuring the integration of obtained results into clinical practice, and supporting the advancement of personalized treatment strategies.
References
Chen Y., Ye J., Li J. Aggregated wasserstein distance and state registration for hidden markov models. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2020. Vol. 42 (9). P. 2133–2147.
Anh C.-T., Kwon Y.-K. Mutual information based on multiple level discretization network inference from time series gene expression profiles. Applied Sciences (Switzerland). 2023. Vol. 13 (21). Art. 11902.
Barman S., Kwon Y.-K. A novel mutual information-based boolean network inference method from time-series gene expression data. PLoS ONE. 2017. Vol. 12 (2). Art. e0171097.
Pan X., Sun J., Yu H., Xue Y. Feature selection using non-dominant features-guided search for gene expression profile data. Complex and Intelligent Systems. 2023. Vol. 9 (6). P. 6139–6153.
Rezapour M., Walker S., Ornelles D., et al. A comparative analysis of rna-seq and nanostring technologies in deciphering viral infection response in upper airway lung organoids. Frontiers in Genetics. 2024. Vol. 15. Art. 1327984.
Bakry K., Emeish W., Embark H., et al. Expression profiles of four nile tilapia innate immune genes during early stages of aeromonas veronii infection. Journal of Aquatic Animal Health. 2024. Vol. 36 (2). P. 164–180.
Cao Q., Zhao J., Wang H., Guan Q., Zheng C. An integrated method based on Wasserstein distance and graph for cancer subtype discovery. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2023. Vol. 20 (6). P. 3499–3510.
Ocal K., Grima R., Sanguinetti G. Wasserstein distances for estimating parameters in stochastic reaction networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2019. Vol. 11773. P. 347–351.
Zhou K., Yin Z., Gu J., Zeng Z. A feature selection method based on graph theory for cancer Classification. Combinatorial Chemistry and High Throughput Screening. 2024. Vol. 27 (5). P. 650–660.
Zhang H. Feature selection using approximate conditional entropy based on fuzzy information granule for gene expression data classification. Frontiers in Genetics. 2021. Vol. 12. Art. 631505.
Almugren N., Alshamlan H. A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access. 2019. Vol. 7. P. 78533–78548.
Guelib B., Bounab R., Aliouane S., et al. Optimizing gene selection for alzheimer’s disease classification: A Bayesian approach to filter and embedded techniques. Applied Soft Computing. 2024. Vol. 167. Art. 112307.
Wang T., Jia L., Xu J., et al. A hybrid intelligent optimization algorithm to select discriminative genes from large-scale medical data. International Journal of Machine Learning and Cybernetics. 2024. Vol. 15 (12). P. 5921–5948.
Yaqoob A., Verma N., Aziz R., Shah M. Rna-seq analysis for breast cancer detection: a study on paired tissue samples using hybrid optimization and deep learning techniques. Journal of Cancer Research and Clinical Oncology. 2024. Vol. 150 (10). Art. 455.
Esfandiari A., Nasiri N. Gene selection and cancer classification using interaction-based feature clustering and improved-binary bat algorithm. Computers in Biology and Medicine. 2024. Vol. 181. Art. 109071.






