Performance analysis of set partitioning formulations on the rule extraction from random forests


Creative Commons License

EDALI M.

PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, cilt.27, sa.4, ss.513-519, 2021 (ESCI) identifier identifier

Özet

Random Forests is a widely used machine learning algorithm for classification and regression problems from different domains. Although they are generally accurate, their interpretability is low compared to their building blocks: single decision trees. Using the fact that each member of a Random Forest is a decision tree, we propose different set partitioning formulations to extract interpretable if-then rules from Random Forests. Our experiments on well-known classification and regression datasets show that the original set partitioning model formulation significantly reduces the number of rules while keeping the accuracy at acceptable levels. We also propose a modification to the problem's objective function, which aims to reduce the number of extracted rules further. We observe a further reduction in the number of extracted rules while the accuracy values stay nearly the same. Although the set partitioning problem is NP-hard, we obtain optimal results for most datasets within twenty minutes.