Data mining technique’s parameters definition and its prediction effect’s based on iron deficiency dataset


Mohammed S. J., Abbas A. K., Ahmad A. A., Mohammed M. S., Sarı M., USLU TUNA H.

Sigma Journal of Engineering and Natural Sciences, cilt.43, sa.2, ss.505-515, 2025 (ESCI) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 43 Sayı: 2
  • Basım Tarihi: 2025
  • Doi Numarası: 10.14744/sigma.2025.00038
  • Dergi Adı: Sigma Journal of Engineering and Natural Sciences
  • Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), Scopus, Academic Search Premier, Directory of Open Access Journals
  • Sayfa Sayıları: ss.505-515
  • Anahtar Kelimeler: Anemia, Ripper, Attribute Selection, Prediction System, Sequential Minimal Optimization
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

Training dataset is not the only element affects the overall prediction system, data mining parameters have also multiple impact on their application processes. In this paper, several data mining techniques have been studied with their main parameters for effectiveness on the Anemia prediction system. For the method K-Nearest Neighbor (K-NN) process, k-value has to be defined to specify number of points to measure the distance to several types of classes. Also, Locally Weighted Learning (LWL) has a kernel value (T) which define the width of searching operation to calculate the weight function of LWL. While Sequential Minimal Optimization (SMO) has n tuples alpha values depend on the training data to satisfy the Kraush Kuhh Tucker (KKT) Condition for speeding up the prediction process. These data mining methods provided a prediction with a high performance when a better selection is optimized for each technique. Meanwhile, dataset size and attributes number have seen to have an impact on these methods performances. In this study, mining methods with feature selection methods compared in terms of proper selection of parameters and depending dataset information. Anemia system has been predicted accurately than the classical version of each method. Features for applied dataset are reduced from 11 to 8 attributes. In addition to these feature reduction and a good method’s parameter selections, K-NN for example has about 3.8% increment in its prediction performances based on the proposed model.