Frequency Difference based DNA Encoding Methods in Human Splice Site Recognition


Pashaei E., AYDIN N.

2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, 5 - 08 October 2017, pp.586-591 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/ubmk.2017.8093471
  • City: Antalya
  • Country: Turkey
  • Page Numbers: pp.586-591
  • Keywords: gene detection, splice site prediction, DNA encoding methods, machine learning, SUPPORT VECTOR MACHINES, PREDICTION
  • Yıldız Technical University Affiliated: Yes

Abstract

Identifying structure of genes in Human genomes highly depends upon accurate recognition of boundaries between exons and introns, i.e. splice sites. Hence, development of new methods for effective detection of splice sites is essential. DNA encoding approaches are used for feature extraction from gene sequences, while machine learning methods are used for classification of splice sites using those extracted features. This paper presents a new DNA encoding method based on triplet nucleotide encoding with the frequency difference between true and false splice site sequences (TN-FDTF). Then, Support Vector Machine (SVM), Artificial Neural Network (NN), Random Forest (RF) and AdaBoost classifiers are used for prediction of splice sites. The performance of the proposed method was assessed on Homo Sapiens Splice Site Dataset (HS3D) using 10 fold cross validation. The results showed that the AdaBoost outperformed all the considered classifiers. In addition, the proposed method achieved higher prediction accuracy than most of the current existing state of the art methods. It is believed that the proposed method can help to achieve better results in Human splice site recognition and eukaryotic gene detection.