Fed3C: federated clustering-based centralized classification


AKPINAR E., BOLAT B., TAŞKIRAN M.

Knowledge and Information Systems, cilt.68, sa.1, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 68 Sayı: 1
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1007/s10115-025-02650-9
  • Dergi Adı: Knowledge and Information Systems
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, ABI/INFORM, Compendex, INSPEC
  • Anahtar Kelimeler: Aggregation, Centralized learning, Centroid, Classification, Federated clustering, Optimization
  • Yıldız Teknik Üniversitesi Adresli: Evet

Özet

The success of artificial intelligence (AI) and machine learning (ML) applications depends on the analysis of large and diverse datasets. However, concerns regarding personal data privacy and data sharing, especially in datasets containing sensitive information, restrict the widespread use and sharing of such data. Federated learning (FL) offers a solution to these issues by enabling multiple users to collaboratively train a global model without the need to share their data. In this study, Federated Clustering-Based Centralized Classification (Fed3C) is proposed, where data belonging to the same class is divided into subsets, centroids are generated for each subset, and these centroids are shared with the server. The optimal number of centroids is determined using Bayesian optimization, and the centroids are generated using the K-means method. These centroids are then sent to the server, where classification is performed using K-nearest neighbors (KNN). The success of the proposed method has been tested on five different datasets, and the effects of changing the maximum number of centroids, the number of clients, the number of iterations, and the optimization algorithm on the method’s performance have been examined. The results demonstrate that the proposed approach, which does not require direct original data sharing, is effective in improving model performance. In particular, in real-world scenarios where the amount of data belonging to a single client is insufficient, the proposed method achieves a notable increase in success by involving multiple clients in the training process.