Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization

Pashaer E., Pashaei E., Aydın N.

GENOMICS, vol.111, pp.669-686, 2019 (Peer-Reviewed Journal) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 111
  • Publication Date: 2019
  • Doi Number: 10.1016/j.ygeno.2018.04.004
  • Journal Name: GENOMICS
  • Journal Indexes: Science Citation Index Expanded, Scopus
  • Page Numbers: pp.669-686
  • Keywords: Gene selection, Binary black hole algorithm, Binary particle swarm optimization, Sparse partial least squares discriminant analysis, Gene expression, PROSTATE-CANCER, COPY NUMBER, EXPRESSION, CLASSIFICATION, RISK, METHYLATION, CARCINOMA, PSO, IDENTIFICATION, PREDICTION


In cancer classification, gene selection is an important data preprocessing technique, but it is a difficult task due to the large search space. Accordingly, the objective of this study is to develop a hybrid meta-heuristic Binary Black Hole Algorithm (BBHA) and Binary Particle Swarm Optimization (BPSO) (4-2) model that emphasizes gene selection. In this model, the BBHA is embedded in the BPSO (4-2) algorithm to make the BPSO (4-2) more effective and to facilitate the exploration and exploitation of the BPSO (4-2) algorithm to further improve the performance. This model has been associated with Random Forest Recursive Feature Elimination (RF-RFE) prefiltering technique. The classifiers which are evaluated in the proposed framework are Sparse Partial Least Squares Discriminant Analysis (SPLSDA); k-nearest neighbor and Naive Bayes. The performance of the proposed method was evaluated on two benchmark and three clinical microarrays. The experimental results and statistical analysis confirm the better performance of the BPSO (4-2)-BBHA compared with the BBHA, the BPSO (4-2) and several state-of-the-art methods in terms of avoiding local minima, convergence rate, accuracy and number of selected genes. The results also show that the BPSO (4-2)-BBHA model can successfully identify known biologically and statistically significant genes from the clinical datasets.