Fed3C: federated clustering-based centralized classification

AKPINAR, Emin; BOLAT, Bülent; TAŞKIRAN, Murat

doi:10.1007/s10115-025-02650-9

Fed3C: federated clustering-based centralized classification

AKPINAR E., BOLAT B., TAŞKIRAN M.

Knowledge and Information Systems, cilt.68, sa.1, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 68 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.1007/s10115-025-02650-9
Dergi Adı: Knowledge and Information Systems
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, ABI/INFORM, Compendex, INSPEC
Anahtar Kelimeler: Aggregation, Centralized learning, Centroid, Classification, Federated clustering, Optimization
Yıldız Teknik Üniversitesi Adresli: Evet

Özet

The success of artificial intelligence (AI) and machine learning (ML) applications depends on the analysis of large and diverse datasets. However, concerns regarding personal data privacy and data sharing, especially in datasets containing sensitive information, restrict the widespread use and sharing of such data. Federated learning (FL) offers a solution to these issues by enabling multiple users to collaboratively train a global model without the need to share their data. In this study, Federated Clustering-Based Centralized Classification (Fed3C) is proposed, where data belonging to the same class is divided into subsets, centroids are generated for each subset, and these centroids are shared with the server. The optimal number of centroids is determined using Bayesian optimization, and the centroids are generated using the K-means method. These centroids are then sent to the server, where classification is performed using K-nearest neighbors (KNN). The success of the proposed method has been tested on five different datasets, and the effects of changing the maximum number of centroids, the number of clients, the number of iterations, and the optimization algorithm on the method’s performance have been examined. The results demonstrate that the proposed approach, which does not require direct original data sharing, is effective in improving model performance. In particular, in real-world scenarios where the amount of data belonging to a single client is insufficient, the proposed method achieves a notable increase in success by involving multiple clients in the training process.