ANALYSIS OF THE CURRENT STATE OF METHODS FOR SUBSET FORMATION OF SIGNIFICANT AND MUTUALLY EXPRESSED GENE EXPRESSION DATA

Authors

DOI:

https://doi.org/10.32782/mathematical-modelling/2024-7-2-2

Keywords:

gene expression, diagnostic system, gene ontology, Shannon entropy, statistical criteria

Abstract

The article provides a comprehensive analysis of the modern approaches to forming subsets of significant and mutually expressed genes based on gene expression data obtained through DNA microarray and RNA sequencing technologies. This topic is of particular relevance since the high-dimensional gene expression matrices generated in such studies require effective preprocessing to identify genes that are critical for understanding the biological systems’ conditions. Modern clustering and biclustering methods play a vital role in reducing the number of genes for further analysis, thereby improving the accuracy of diagnostics and biological process analysis. Moreover, the use of Gene Ontology (GO) facilitates the structured description of the functional roles of genes in various biological processes, molecular functions, and cellular components. This enhances data processing quality and enables researchers to focus on key genes playing significant roles in pathological processes. The article also addresses different stages of data preprocessing, including the removal of non-expressed genes, the identification of differentially expressed genes using tools like DESeq2 and EdgeR, and the application of meta-analysis to integrate results from multiple studies. GO analysis allows researchers to effectively identify enriched GO terms associated with functionally significant genes and interpret the results through visualizations such as graphs and diagrams. However, one of the key challenges remains the standardization of results and ensuring consistency across various research groups, which is essential for integrating data into a unified diagnostic system. The paper highlights the importance of further enhancing approaches to gene expression analysis and data integration, which will significantly improve the efficiency of bioinformatics research in disease diagnostics and personalized medicine.

References

Green H. RNA Processing. Cold Spring Harbor Perspectives in Biology. 2017. Vol. 9 (5). Art. no. a032425.

Wang Z., Gerstein M., Snyder M. RNA-Seq: A revolutionary tool for transcriptomics. Nature Reviews Genetics. 2019. Vol. 10 (1). P. 57–63.

Bolstad B.M., Irizarry R.A., Astrand M., Speed T.P. A Comparison of Normalization Methods for High-Density Oligonucleotide Array Data Based on Variance and Bias. Bioinformatics. 2023. Vol. 19. P. 185–193.

Qin S., Tang X., Chen Y. et al. mRNA-based therapeutics: powerful and versatile tools to combat diseases. Signal Trans. and Targeted Therapy. 2022. Vol. 7 (1). Art. no. 166.

Ashburner M., Ball C.A., Blake J.A., et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics. 2000. Vol. 25 (1). P. 25–29.

Gruber T.R. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition. 1993. Vol. 5 (2). P. 199–220.

The Gene Ontology Consortium. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Research. 2021. Vol. 49 (D1). P. D325–D334.

Huntley R.P., Sawford T., et al. Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt. GigaScience. 2015. Vol. 4 (1). Art. no. 4.

Love M.I., Huber W., Anders S. Moderated estimation of fold change and dispersion for RNAseq data with DESeq2. Genome Biology. 2014. Vol. 15. Art. No. 550.

Chen Y., Chen L., Lun A.T.L., et al. EDGER 4.0: powerful differential analysis of sequencing data with expanded functionality and improved support for small counts and larger datasets. bioRxiv. 2024.

Pandey D., Perumal P.O. Improved meta-analysis pipeline ameliorates distinctive gene regulators of diabetic vasculopathy in human endothelial cell (hECs) RNA-Seq data. PLoS ONE. 2023. Vol. 18 (11). Art. no. e0293939.

Suzi A., James B., Seth C., et al. The Gene Ontology knowledgebase in 2023. Genetics. 2023. Vol. 224(1). Art. no. iyad031.

Shin M.G., Pico A.R. Using published pathway figures in enrichment analysis and machine learning. BMC Genomics. 2023. Vol. 24. Art. no. 713.

Babichev S., Korobchynskyi M., Rudenko M., Batenko H. Applying biclustering technique and gene ontology analysis for gene expression data processing. CEUR Workshop Proceedings. 2024. Vol. 3675. P. 14–28.

Ouma W.Z., Pogacar K., Grotewold E. Topological and statistical analyses of gene regulatory networks reveal unifying yet quantitatively different emergent properties. PLoS Computational Biology. 2018. Vol. 14 (4). Art. no. e1006098.

Bioconductor: Open source software for Bioinformatics. 2024. July, 29. URL: https://www.bioconductor.org.

The Cancer Genome Atlas Program (TCGA). National Cancer Institution. Center for Cancer Genomics. 2024. July, 27. URL: https://www.cancer.gov/ccg/research/genome-sequencing/tcga.

Babichev S., Škvor J. Technique of Gene Expression Profiles Extraction Based on the Complex Use of Clustering and Classification Methods. Diagnostics. 2020. Vol. 10 (8). Art. no. 584.

Liakh I., Babichev S., Durnyak B., Gado I. Formation of Subsets of Co-expressed Gene Expression Profiles Based on Joint Use of Fuzzy Inference System, Statistical Criteria and Shannon Entropy. Lecture Notes on Data Engineering and Communications Technologies. 2023. Vol. 149. P. 25–41.

Hou J., Aerts J., den Hamer B., et al. Gene expression-based classification of non-small cell lung carcinomas and survival prediction. PLoS ONE. 2010. Vol. 5. Art. no. e10312.

Yasinska-Damri L., Babichev S., Spivakovsky A., Lemeshchuk O. Formation and Analysis of Gene Expression Data Based on the Joint Use of Data Mining and Machine Learning Techniques. CEUR Workshop Proceedings. 2023. Vol. 3373. P. 87–98.

Published

2024-12-30