Metrics-Driven Software Quality Prediction Without Prior Fault Data

Çatal Ç., Sevim U., Diri B.

in: Electronic Engineering and Computing Technology Series:Lecture Notes in Electrical Engineering, Gelman L., Editor, Springer Science+Business Media, Dordrecht, pp.189-199, 2010

  • Publication Type: Book Chapter / Chapter Research Book
  • Publication Date: 2010
  • Publisher: Springer Science+Business Media
  • City: Dordrecht
  • Page Numbers: pp.189-199
  • Editors: Gelman L., Editor
  • Yıldız Technical University Affiliated: Yes


Software quality assessment models are quantitative analytical models that are more reliable compared to qualitative models based on personal judgment. These assessment models are classified into two groups: generalized and product-specific models. Measurement-driven predictive models, a subgroup of product-specific models, assume that there is a predictive relationship between software measurements and quality. In recent years, greater attention in quality assessment models has been devoted to measurement-driven predictive models and the field of software fault prediction modeling has become established within the product-specific model category. Most of the software fault prediction studies focused on developing fault predictors by using previous fault data. However, there are cases when previous fault data are not available. In this study, we propose a novel software fault prediction approach that can be used in the absence of fault data. This fully automated technique does not require an expert during the prediction process and it does not require identifying the number of clusters before the clustering phase, as required by the K-means clustering method. Software metrics thresholds are used to remove the need for an expert. Our technique first applies the X-means clustering method to cluster modules and identifies the best cluster number. After this step, the mean vector of each cluster is checked against the metrics thresholds vector. A cluster is predicted as fault-prone if at least one metric of the mean vector is higher than the threshold value of that metric. Three datasets, collected from a Turkish white-goods manufacturer developing embedded controller software, have been used during experimental studies. Experiments revealed that unsupervised software fault prediction can be automated fully and effective results can be achieved by using the X-means clustering method and software metrics thresholds.