Performance analysis of set partitioning formulations on the rule extraction from random forests


Creative Commons License

EDALI M.

PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, vol.27, no.4, pp.513-519, 2021 (Journal Indexed in ESCI) identifier

  • Publication Type: Article / Article
  • Volume: 27 Issue: 4
  • Publication Date: 2021
  • Doi Number: 10.5505/pajes.2020.05926
  • Title of Journal : PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI
  • Page Numbers: pp.513-519
  • Keywords: Random forests, Rule extraction, Set partitioning Classification, Regression, Interpretability, ENSEMBLES

Abstract

Random Forests is a widely used machine learning algorithm for classification and regression problems from different domains. Although they are generally accurate, their interpretability is low compared to their building blocks: single decision trees. Using the fact that each member of a Random Forest is a decision tree, we propose different set partitioning formulations to extract interpretable if-then rules from Random Forests. Our experiments on well-known classification and regression datasets show that the original set partitioning model formulation significantly reduces the number of rules while keeping the accuracy at acceptable levels. We also propose a modification to the problem's objective function, which aims to reduce the number of extracted rules further. We observe a further reduction in the number of extracted rules while the accuracy values stay nearly the same. Although the set partitioning problem is NP-hard, we obtain optimal results for most datasets within twenty minutes.