Improving and Assessing the Prediction Capability of Machine Learning Algorithms for Breast Cancer Diagnosis

Tasdemir F. A.

4th International Conference on Intelligent and Fuzzy Systems (INFUS), Bornova, Turkey, 19 - 21 July 2022, vol.505, pp.182-189 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 505
  • Doi Number: 10.1007/978-3-031-09176-6_22
  • City: Bornova
  • Country: Turkey
  • Page Numbers: pp.182-189
  • Keywords: Machine learning, Classification, Support vector machines, Logistic regression, Random forest, Breast cancer diagnosis, Hyper-parameter tuning, Grid search
  • Yıldız Technical University Affiliated: No


Currently, one of the most common forms of cancer is breast cancer. In 2020, breast cancer caused 2.3 million cases and approximately 685,000 deaths worldwide. Since breast cancer is the second leading cause of death among women, it is very important to detect whether a biopsy cell is benign or malignant at an early stage so that it is not fatal. However, the breast cancer diagnosis process is quite complex as it consists of several stages, such as collecting and analyzing multivariate samples. These time demanding procedures delay diagnosis and pose a risk for people. On the other hand, the rapid development of Machine Learning (ML) and its applications in healthcare are bringing a new perspective to process and analyze medical big data. In addition, ML techniques help medical experts by analyzing the data in a short time and reduce time pressure on decision making procedures. Taking those into consideration in this study, different ML algorithms are employed for predicting if a cell nucleus is benign or malignant using Wisconsin Diagnostic Breast Cancer (WDBC) dataset. The ML algorithms utilized in this paper are: Support Vector Machines (SVM), Logistic Regression (LR) and Random Forest (RF). Dataset includes 32 attributes with 569 cases consisting of 357 benign and 212 malignant. To improve the accuracy of the results, hyperparameter tuning was done using Grid Search and results are compared. The simulation of algorithms is done by Python Programming language.