2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Türkiye, 5 - 08 Ekim 2017, ss.586-591
Identifying structure of genes in Human genomes highly depends upon accurate recognition of boundaries between exons and introns, i.e. splice sites. Hence, development of new methods for effective detection of splice sites is essential. DNA encoding approaches are used for feature extraction from gene sequences, while machine learning methods are used for classification of splice sites using those extracted features. This paper presents a new DNA encoding method based on triplet nucleotide encoding with the frequency difference between true and false splice site sequences (TN-FDTF). Then, Support Vector Machine (SVM), Artificial Neural Network (NN), Random Forest (RF) and AdaBoost classifiers are used for prediction of splice sites. The performance of the proposed method was assessed on Homo Sapiens Splice Site Dataset (HS3D) using 10 fold cross validation. The results showed that the AdaBoost outperformed all the considered classifiers. In addition, the proposed method achieved higher prediction accuracy than most of the current existing state of the art methods. It is believed that the proposed method can help to achieve better results in Human splice site recognition and eukaryotic gene detection.