Performance Evaluation of various ML techniques for Software Fault Prediction using NASA dataset


Alsangari B., Bircik G.

5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, HORA 2023, İstanbul, Türkiye, 8 - 10 Haziran 2023 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/hora58378.2023.10156708
  • Basıldığı Şehir: İstanbul
  • Basıldığı Ülke: Türkiye
  • Anahtar Kelimeler: KNN, ML, RF, SFP, Software Engineering
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

In order to improve software dependability, Software Fault Prediction (SFP) has become an important research topic in the area of software engineering. To improve program dependability, program defect predictions are being utilized to aid developers in anticipating prospective issues and optimizing testing resources. As a result of this method, the amount of software defects may be forecast, and software testing resources are directed toward the software modules that have the greatest issues, enabling the defects to be fixed as soon as possible. As a result, this paper handles the issue related for SFP based on using a dataset known as JM1 provided by NASA, with 21 features. In this study, several Machine Learning (ML) techniques will be studied, which include Logistic Regression (LR), Random Forest (RF), Naive Bias (NB), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) with three distance metric, Decision Tree (DT). Three cases of normalization will be involved with investigation which are the without sampling, Random over Sample and the SMOTE. Performance evaluation will be based on various parameters such as the ACC, Recall, Precision, and F1-Score. Results obtained indicate that RF achieve the higher ACC with values of 0.81%, 0.92%, and 0.88% respectively. The comprehensive findings of this study may be utilized as a baseline for subsequent studies, allowing any claim of improved prediction using any new approach, model, or framework to be compared and confirmed. In future, the variation of feature number will be involved with performance evaluation in handling SFP.