Combining multiple views: Case studies on protein and arrhythmia features


Sakar C. O., Kursun O., Seker H., Gurgen F., AYDIN N., FAVOROV O.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, cilt.28, ss.174-180, 2014 (SCI-Expanded) identifier identifier

Özet

Computational annotation of protein functions and structures from sequence features, or prediction of certain diseases from gene expression levels are among important applications of computational biology. Developing methods capable of such predictions are not only important in terms of their biological and medical uses but also a very challenging task of pattern recognition due to high input dimensionality and small sample size. Ensemble and multi-view learning has gained popularity due to the rapid rise of such datasets (such as the protein and arrhythmia datasets used in this paper) with large numbers of variables. However, the classical ensemble approach does not take into account conditional interdependences among the views. In this paper, we present a two stage supervised multi-view learning technique called parallel interacting multi-view learning (PIML). In the first stage of PIML, similar to the ensemble method, the views are individually used by a predictor, and the class posterior probability estimates are obtained. In the second stage, each view is trained using its own features along with the class posterior probability estimates of the other views as the summary information of other views. This is a hybrid way of combining the views in which the views influence each other during training using the predictions of others interdependences. PIML is demonstrated and compared with the classical ensemble approach on three real datasets. (C) 2013 Elsevier Ltd. All rights reserved.

Computational annotation of protein functions and structures from sequence features, or prediction of certain diseases from gene expression levels are among important applications of computational biology. Developing methods capable of such predictions are not only important in terms of their biological and medical uses but also a very challenging task of pattern recognition due to high input dimensionality and small sample size. Ensemble and multi-view learning has gained popularity due to the rapid rise of such datasets (such as the protein and arrhythmia datasets used in this paper) with large numbers of variables. However, the classical ensemble approach does not take into account conditional interdependences among the views. In this paper, we present a two stage supervised multi-view learning technique called parallel interacting multi-view learning (PIML). In the first stage of PIML, similar to the ensemble method, the views are individually used by a predictor, and the class posterior probability estimates are obtained. In the second stage, each view is trained using its own features along with the class posterior probability estimates of the other views as the summary information of other views. This is a hybrid way of combining the views in which the views influence each other during training using the predictions of others interdependences. PIML is demonstrated and compared with the classical ensemble approach on three real datasets.