Using gene expression profiles of cancer patients with image-based deep learning approach to develop methods for classification and prediction of cancer while revealing critical genes


Creative Commons License

Thesis Type: Postgraduate

Institution Of The Thesis: Yildiz Technical University, Faculty Of Chemıcal And Metallurgıcal Engıneerıng, Department Of Bioengineering, Turkey

Approval Date: 2021

Thesis Language: English

Student: BÜŞRA NUR DARENDELİ KİRAZ

Supervisor: Alper Yılmaz

Abstract:

Cancer is one of the malignant diseases worldwide. Difficulties in diagnosis and treatment cannot prevent the progression of the disease and cause the death of millions of people. The intra-tumor and inter-tumor heterogeneity characteristic of tumor cells has resulted incancer being a disease with individual characteristics. Since each individual has a unique tumor and tumor microenvironment, general screening methods make early detection of the disease difficult. Here, we aimed to provide new perspective of cancer diagnosis using deep learning approach on gene expression data. The training of gene expression data, in which the exact results of the changes in the genomeare seen, was carried out using the deep learning method. In addition, it is aimed to identify critical genes that are effective in identifying tumor and normal tissues, which deep learning has determined with high accuracy. In this study, The Cancer Genome Atlas (TCGA) dataset with RNA-Seq data of approximately 30 different types of cancer patients and GTEx RNA-seq data of normal tissues were used. The input data for the training was transformed to RGB format and the training was carried out with a Convolutional Neural Netowk (CNN). Thetrained algorithm is able to predict cancer with 97.7% accuracy, based on gene expression data. Moreover, we applied one-pixel attack on the trained model to determine effective genes for prediction of the disease. As a result of the application of this method, 13 critical genes that are effective on the prediction of the deep learning model were determined. As a result, with the developed deep learning model, a model that can distinguish tumor and normal tissues based on gene expression data has been developed. By examining the prediction mechanism of this model, genes that are candidates to be biomarkers for cancer were determined. When the identified genes were searched in the literature, their relationship with cancer was observed. These genes, which were determined as a result of the study, can be used as a biomarker for cancer by supporting experimental data. In line with the results obtained, it is shown that individual cancers can be examined on the basis of genes, and that individual diagnoses and treatments can also be applied.