Journal Home > Volume 28 , Issue 5

The influence of non-Independent Identically Distribution (non-IID) data on Federated Learning (FL) has been a serious concern. Clustered Federated Learning (CFL) is an emerging approach for reducing the impact of non-IID data, which employs the client similarity calculated by relevant metrics for clustering. Unfortunately, the existing CFL methods only pursue a single accuracy improvement, but ignore the convergence rate. Additionlly, the designed client selection strategy will affect the clustering results. Finally, traditional semi-supervised learning changes the distribution of data on clients, resulting in higher local costs and undesirable performance. In this paper, we propose a novel CFL method named ASCFL, which selects clients to participate in training and can dynamically adjust the balance between accuracy and convergence speed with datasets consisting of labeled and unlabeled data. To deal with unlabeled data, the prediction labels strategy predicts labels by encoders. The client selection strategy is to improve accuracy and reduce overhead by selecting clients with higher losses participating in the current round. What is more, the similarity-based clustering strategy uses a new indicator to measure the similarity between clients. Experimental results show that ASCFL has certain advantages in model accuracy and convergence speed over the three state-of-the-art methods with two popular datasets.


menu
Abstract
Full text
Outline
About this article

ASCFL: Accurate and Speedy Semi-Supervised Clustering Federated Learning

Show Author's information Jingyi He1Biyao Gong1Jiadi Yang1Hai Wang1Pengfei Xu1Tianzhang Xing1,2( )
School of Information Science and Technology, Northwest University, Xi’an 710100, China
Internet of Things Research Center, Northwest University, Xi’an 710100, China

Abstract

The influence of non-Independent Identically Distribution (non-IID) data on Federated Learning (FL) has been a serious concern. Clustered Federated Learning (CFL) is an emerging approach for reducing the impact of non-IID data, which employs the client similarity calculated by relevant metrics for clustering. Unfortunately, the existing CFL methods only pursue a single accuracy improvement, but ignore the convergence rate. Additionlly, the designed client selection strategy will affect the clustering results. Finally, traditional semi-supervised learning changes the distribution of data on clients, resulting in higher local costs and undesirable performance. In this paper, we propose a novel CFL method named ASCFL, which selects clients to participate in training and can dynamically adjust the balance between accuracy and convergence speed with datasets consisting of labeled and unlabeled data. To deal with unlabeled data, the prediction labels strategy predicts labels by encoders. The client selection strategy is to improve accuracy and reduce overhead by selecting clients with higher losses participating in the current round. What is more, the similarity-based clustering strategy uses a new indicator to measure the similarity between clients. Experimental results show that ASCFL has certain advantages in model accuracy and convergence speed over the three state-of-the-art methods with two popular datasets.

Keywords: semi-supervised learning, federated learning, clustered federated learning, non-Independent Identically Distribution (non-IID) data, similarity indicator, client selection

References(46)

[1]
T. Yang, X. Li, and H. Shao, Federated learning-based power control and computing for mobile edge computing system, in Proc. 2021 IEEE 94th Vehicular Technology Conf., Norman, OK, USA, 2021, pp. 1–6.
[2]
I. Gupta, R. Gupta, A. K. Singh, and R. Buyya, MLPAM: A machine learning and probabilistic analysis based model for preserving security and privacy in cloud environment, IEEE Syst. J., vol. 15, no. 3, pp. 4248–4259, 2021.
[3]
M. Kumar, M. Rossbory, B. A. Moser, and B. Freudenthaler, Deriving an optimal noise adding mechanism for privacy-preserving machine learning, in Database and Expert Systems Applications, G. Anderst-Kotsis, A. M. Tjoa, I. Khalil, M. Elloumi, A. Mashkoor, J. Sametinger, X. Larrucea, A. Fensel, J. Martinez-Gil, B. Moser, et al., eds. Linz, Austria: Springer, 2019, pp. 108–118.
[4]
A. K. Sandhu, Big data with cloud computing: Discussions and challenges, Big Data Mining and Analytics, vol. 5, no. 1, pp. 32–40, 2022.
[5]
J. Feng, C. Rong, F. Sun, D. Guo, and Y. Li, PMF: A privacy-preserving human mobility prediction framework via federated learning, Proc. ACM Interact., Mob., Wearable Ubiquitous Technol., vol. 4, no. 1, p. 10, 2020.
[6]
T. Yu, T. Li, Y. Sun, S. Nanda, V. Smith, V. Sekar, and S. Seshan, Learning context-aware policies from multiple smart homes via federated multi-task learning, in Proc. 2020 IEEE/ACM 5th Int. Conf. Internet-of-Things Design and Implementation, Sydney, Australia, 2020, pp. 104–115.
[7]
A. Giuseppi, L. Della Torre, D. Menegatti, and A. Pietrabissa, AdaFed: Performance-based adaptive federated learning, in Proc. 5th Int. Conf. Advances in Artificial Intelligence, virtual, 2021, pp. 38–43.
[8]
L. Tu, X. Ouyang, J. Zhou, Y. He, and G. Xing, FedDL: Federated learning via dynamic layer sharing for human activity recognition, in Proc. 19th ACM Conf. Embedded Networked Sensor Systems, 2021, pp. 15–28.
[9]
W. Zheng, L. Yan, C. Gou, and F. Y. Wang, Federated meta-learning for fraudulent credit card detection, in Proc. 29th Int. Joint Conf. Artificial Intelligence, Yokohama, Japan, 2021, pp. 4654–4660.
[10]
A. H. Gonsalves, F. Thabtah, R. M. A. Mohammad, and G. Singh, Prediction of coronary heart disease using machine learning: An experimental analysis, in Proc. 2019 3rd Int. Conf. Deep Learning Technologies, Xiamen, China, 2019, pp. 51–56.
[11]
Y. S. Can and C. Ersoy, Privacy-preserving federated deep learning for wearable IoT-based biomedical monitoring, ACM Transactions on Internet Technology, vol. 21, no. 1, p. 21, 2021.
[12]
D. C. Nguyen, Q. V. Pham, P. N. Pathirana, M. Ding, A. Seneviratne, Z. Lin, O. Dobre, and W. J. Hwang, Federated learning for smart healthcare: A survey, ACM Comput. Surv., vol. 55, no. 3, p. 60, 2022.
[13]
J. Zhang, X. Cheng, C. Wang, Y. Wang, Z. Shi, J. Jin, A. Song, W. Zhao, L. Wen, and T. Zhang, FedAda: Fast-convergent adaptive federated learning in heterogeneous mobile edge computing environment, World Wide Web, vol. 25, pp. 1971–1998, 2022.
[14]
F. Sattler, S. Wiedemann, K. R. Müller, and W. Samek, Robust and communication-efficient federated learning from Non-i.i.d. data, IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 9, pp. 3400–3413, 2020.
[15]
F. Sattler, K. R. Müller, and W. Samek, Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 8, pp. 3710–3722, 2020.
[16]
H. Yang, F. Li, D. Yu, Y. Zou, and J. Yu, Reliable data storage in heterogeneous wireless sensor networks by jointly optimizing routing and storage node deployment, Tsinghua Science and Technology, vol. 26, no. 2, pp. 230–238, 2021.
[17]
C. Briggs, Z. Fan, and P. Andras, Federated learning with hierarchical clustering of local updates to improve training on non-IID data, presented at the 2020 Int. Joint Conf. Neural Networks, Glasgow, UK, 2020, pp. 1–9.
[18]
L. Yu, W. Nie, L. Xin, and M. Guo, Clustered federated learning based on data distribution, in Proc. 3rd Int. Conf. Advanced Information Science and System, Sanya, China, 2021, p. 51.
[19]
Z. Xue and H. Wang, Effective density-based clustering algorithms for incomplete data, Big Data Mining and Analytics, vol. 4, no. 3, pp. 183–194, 2021.
[20]
A. Ghosh, J. Chung, D. Yin, and K. Ramchandran, An efficient framework for clustered federated learning, in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 19586–19597.
[21]
M. Tang, X. Ning, Y. Wang, J. Sun, Y. Wang, H. Li, and Y. Chen, FedCor: Correlation-based active client selection strategy for heterogeneous federated learning, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 10092–10101.
[22]
F. Lai, X. Zhu, H. V. Madhyastha, and M. Chowdhury, Oort: Efficient federated learning via guided participant selection, in Proc. 18th USENIX Symposium on Operating Systems Design and Implemerttation, Virtual, 2021, pp. 19–35.
[23]
Y. J. Cho, J. Wang, and G. Joshi, Client selection in federated learning: Convergence analysis and power-of-choice selection strategies, arXiv preprint arXiv: 2010.01243, 2020.
[24]
T. Nishio and R. Yonetani, Client selection for federated learning with heterogeneous resources in mobile edge, in Proc. 2019 IEEE Int. Conf. Communications, Shanghai, China, 2019, pp. 1–7.
[25]
H. Wang, Z. Kaplan, D. Niu, and B. Li, Optimizing federated learning on non-IID data with reinforcement learning, in Proc. IEEE Conf. Computer Communications, Toronto, Canada, 2020, pp. 1698–1707.
[26]
T. Li, M. Sanjabi, A. Beirami, and V. Smith, Fair resource allocation in federated learning, in Proc. 8th Int. Comp. on Learning Reproseations, Addis Abȧba, Ethiopia, 2020. 2020.
[27]
C. Zhang, S. Zhang, J. J. Q. Yu, and S. Yu, FASTGNN: A topological information protected federated learning approach for traffic speed forecasting, IEEE Trans. Ind. Inform., vol. 17, no. 12, pp. 8464–8474, 2021.
[28]
Y. Zhu, Y. Liu, J. J. Q. Yu, and X. Yuan, Semi-supervised federated learning for travel mode identification from GPS trajectories, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 3, pp. 2380–2391, 2022.
[29]
J. Guo, H. Liang, S. Ai, C. Lu, H. Hua, and J. Cao, Improved approximate minimum degree ordering method and its application for electrical power network analysis and computation, Tsinghua Science and Technology, vol. 26, no. 4, pp. 464–474, 2021.
[30]
X. Liang, Y. Lin, H. Fu, L. Zhu, and X. Li, RSCFed: Random sampling consensus federated semi-supervised learning, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 10144–10153.
[31]
D. A. Van Dyk and X. L. Meng, The art of data augmentation, J. Comput. Graph. Statist., vol. 10, no. 1, pp. 1–50, 2001.
[32]
C. Shorten and T. M. Khoshgoftaar, A survey on image data augmentation for deep learning, J. Big Data, vol. 6, no. 1, p. 60, 2019.
[33]
G. Xu, F. Dong, and J. Feng, Mapping the technological landscape of emerging industry value chain through a patent lens: An integrated framework with deep learning, IEEE Trans. Eng. Manag., vol. 69, no. 6, pp. 3367–3378, 2022.
[34]
J. Xie, Z. Zheng, X. Fang, S. C. Zhu, and Y. N. Wu, Learning cycle-consistent cooperative networks via alternating MCMC teaching for unsupervised cross-domain translation, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 12, pp. 10430–10440, 2021.
[35]
H. Bai, Y. Yang, and J. Wang, Exploiting more associations between slots for multi-domain dialog state tracking, Big Data Mining and Analytics, vol. 5, no. 1, pp. 41–52, 2022.
[36]
R. Bi, Q. Liu, J. Ren, and G. Tan, Utility aware offloading for mobile-edge computing, Tsinghua Science and Technology, vol. 26, no. 2, pp. 239–250, 2021.
[37]
J. Xu and H. Wang, Client selection and bandwidth allocation in wireless federated learning networks: A long-term perspective, IEEE Trans. Wirel. Commun., vol. 20, no. 2, pp. 1188–1200, 2021.
[38]
S. AbdulRahman, H. Tout, A. Mourad, and C. Talhi, FedMCCS: Multicriteria client selection model for optimal IoT federated learning, IEEE Internet of Things Journal, vol. 8, no. 6, pp. 4723–4735, 2021.
[39]
C. Li, X. Zeng, M. Zhang, and Z. Cao, PyramidFL: A fine-grained client selection framework for efficient federated learning, in Proc. 28th Annu. Int. Conf. Mobile Computing and Networking, Sydney, Australia, 2022, pp. 158–171.
[40]
Y. Ma and H. Ghasemzadeh, LabelForest: Non-parametric semi-supervised learning for activity recognition, Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 4520–4527, 2019.
[41]
J. Xie, R. Girshick, and A. Farhadi, Unsupervised deep embedding for clustering analysis, in Proc. 3rd Int. Conf. on Machine Learning, New York, USA, 2016, pp. 478–487.
[42]
K. Nandury, A. Mohan, and F. Weber, Cross-silo federated training in the cloud with diversity scaling and semi-supervised learning, in Proc. 2021 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Toronto, Canada, 2021, pp. 3085–3089.
[43]
Z. Zhang, Y. Yang, Z. Yao, Y. Yan, J. E. Gonzalez, K. Ramchandran, and M. W. Mahoney, Improving semi-supervised federated learning by reducing the gradient diversity of models, in Proc. 2021 IEEE Int. Conf. Big Data, Orlando, FL, USA, 2021, pp. 1214–1225.
[44]
Q. Liu, H. Yang, Q. Dou, and P. A. Heng, Federated semi-supervised medical image classification via inter-client relation matching, in Proc. 24th Int. Conf. Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, 2021, pp. 325–335.
[45]
O. Aouedi, K. Piamrat, G. Muller, and K. Singh, FLUIDS: Federated learning with semi-supervised approach for intrusion detection system, in Proc. 2022 IEEE 19th Annu. Consumer Communications & Networking Conf., Las Vegas, NV, USA, 2022, pp. 523–524.
[46]
Y. Kang, Y. Liu, and X. Liang, FedCVT: Semi-supervised vertical federated learning with cross-view training, ACM Trans. Intell. Syst. Technol., vol. 13, no. 4, p. 64, 2022.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 09 November 2022
Accepted: 27 November 2022
Published: 19 May 2023
Issue date: October 2023

Copyright

© The author(s) 2023.

Acknowledgements

The work was supported by the National Key Research and Development Program of China (No. 2019YFC1520904) and the National Natural Science Foundation of China (No. 61973250).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return