Feature based quality assessment of DNA sequencing chromatograms

ÖZ, Ersoy; KURT, Serkan; Asyali, Musa; Kaya, Huseyin; Yucel, Yeliz

doi:10.1016/j.asoc.2016.01.025

Feature based quality assessment of DNA sequencing chromatograms

ÖZ E., KURT S., Asyali M. H., Kaya H., Yucel Y.

APPLIED SOFT COMPUTING, cilt.41, ss.420-427, 2016 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 41
Basım Tarihi: 2016
Doi Numarası: 10.1016/j.asoc.2016.01.025
Dergi Adı: APPLIED SOFT COMPUTING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.420-427
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Although next generation sequencing applications are getting dominant in molecular genetics, there are still many institutions that want to utilize their legacy sequencers as much as possible. An important concern in sequencing services is the quality of trace files presented to the customers. In this respect, the quality of the trace files should be screened and low quality files should be handled differently before reaching to customers. The quality scores already present in the trace files provide some useful information, however by incorporating auxiliary information we can improve to reliability of these scores. To this end, we used a feature based supervised classification strategy which requires a set of training and testing trace files qualities of which are determined manually. We tested several machine learning algorithms, namely k-nearest neighbors, Naive Bayes, Support Vector Machines and Random Forest, on a public DNA trace repository. Our results indicate that RF method with only 4 simple features provides a classification accuracy rate of 94.68% with a high level of reliability of concurrence (Kappa = 0.8679). (C) 2016 Elsevier B.V. All rights reserved.