Federated Meta Reinforcement Learning for Personalized Tasks

Wentao Liu; Xiaolong Xu; Jintao Wu; Jielin Jiang

doi:10.26599/TST.2023.9010066

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (3.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Federated Meta Reinforcement Learning for Personalized Tasks

Wentao Liu^¹, Xiaolong Xu^²(

), Jintao Wu^², Jielin Jiang^²

1School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China

2School of Software, Nanjing University of Information Science and Technology, Nanjing 210044, China

Show Author Information

Abstract

As an emerging privacy-preservation machine learning framework, Federated Learning (FL) facilitates different clients to train a shared model collaboratively through exchanging and aggregating model parameters while raw data are kept local and private. When this learning framework is applied to Deep Reinforcement Learning (DRL), the resultant Federated Reinforcement Learning (FRL) can circumvent the heavy data sampling required in conventional DRL and benefit from diversified training data, besides privacy preservation offered by FL. Existing FRL implementations presuppose that clients have compatible tasks which a single global model can cover. In practice, however, clients usually have incompatible (different but still similar) personalized tasks, which we called task shift. It may severely hinder the implementation of FRL for practical applications. In this paper, we propose a Federated Meta Reinforcement Learning (FMRL) framework by integrating Model-Agnostic Meta-Learning (MAML) and FRL. Specifically, we innovatively utilize Proximal Policy Optimization (PPO) to fulfil multi-step local training with a single round of sampling. Moreover, considering the sensitivity of learning rate selection in FRL, we reconstruct the aggregation optimizer with the Federated version of Adam (Fed-Adam) on the server side. The experiments demonstrate that, in different environments, FMRL outperforms other FL methods with high training efficiency brought by Fed-Adam.

Keywords

reinforcement learning personalization federated learning meta-learning

References

[1]

T. Ben-Nun and T. Hoefler, Demystifying parallel and distributed deep learning: An in-depth concurrency analysis, ACM Comput. Surv., vol. 52, no. 4, p. 65, 2019.

Crossref Google Scholar

[2]

M. Langer, Z. He, W. Rahayu, and Y. Xue, Distributed training of deep learning models: A taxonomic perspective, IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 12, pp. 2802–2818, 2020.

Crossref Google Scholar

[3]

L. Gu, M. Cui, L. Xu, and X. Xu, Collaborative offloading method for digital twin empowered cloud edge computing on Internet of vehicles, Tsinghua Science and Technology, vol. 28, no. 3, pp. 433–451, 2023.

Crossref Google Scholar

[4]

X. Zhou, W. Liang, K. I. K. Wang, and L. T. Yang, Deep correlation mining based on hierarchical hybrid networks for heterogeneous big data recommendations, IEEE Trans. Comput. Soc. Syst., vol. 8, no. 1, pp. 171–178, 2021.

Crossref Google Scholar

[5]

Q. He, Z. Dong, F. Chen, S. Deng, W. Liang, and Y. Yang, Pyramid: Enabling hierarchical neural networks with edge computing, in Proc. ACM Web Conf. 2022, Virtual Event, Lyon, France, 2022, pp. 1860–1870.

Crossref

[6]

P. Voigt and A. V. D. Bussche, The EU General Data Protection Regulation (GDPR): A Practical Guide. Cham, Switzerland: Springer, 2017.

Crossref

[7]

H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, Communication-efficient learning of deep networks from decentralized data, arXiv preprint arXiv: 1602.05629, 2016.

[8]

H. H. Zhuo, W. Feng, Y. Lin, Q. Xu, and Q. Yang, Federated deep reinforcement learning, arXiv preprint arXiv: 1901.08277, 2019.

[9]

V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare, and J. Pineau, An introduction to deep reinforcement learning, Found. Trends® Mach. Learn., vol. 11, nos. 3&4, pp. 219–354, 2018.

Crossref Google Scholar

[10]

X. Zhou, W. Liang, K. Yan, W. Li, K. I. K. Wang, J. Ma, and Q. Jin, Edge-enabled two-stage scheduling based on deep reinforcement learning for Internet of everything, IEEE Internet Things J., vol. 10, no. 4, pp. 3295–3304, 2022.

Crossref Google Scholar

[11]

N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y. C. Liang, and D. I. Kim, Applications of deep reinforcement learning in communications and networking: A survey, IEEE Commun. Surv. Tutor., vol. 21, no. 4, pp. 3133–3174, 2019.

Crossref Google Scholar

[12]

D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly, M. Kalakrishnan, V. Vanhoucke, et al., QT-Opt: Scalable deep reinforcement learning for vision-based robotic manipulation, arXiv preprint arXiv:1806.10293v3, 2018.

[13]

B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Pérez, Deep reinforcement learning for autonomous driving: A survey, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 4909–4926, 2022.

Crossref Google Scholar

[14]

F. X. Fan, Y. Ma, Z. Dai, W. Jing, C. Tan, and B. K. H. Low, Fault-tolerant federated reinforcement learning with theoretical guarantee, arXiv preprint arXiv: 2110.14074, 2021.

[15]

S. Liu, K. C. See, K. Y. Ngiam, L. A. Celi, X. Sun, and M. Feng, Reinforcement learning for clinical decision support in critical care: Comprehensive review, J. Med. Internet Res., vol. 22, no. 7, p. e18477, 2020.

Crossref Google Scholar

[16]

S. Yu, X. Chen, Z. Zhou, X. Gong, and D. Wu, When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5G ultradense network, IEEE Internet Things J., vol. 8, no. 4, pp. 2238–2251, 2021.

Crossref Google Scholar

[17]

X. Xia, F. Chen, Q. He, J. Grundy, M. Abdelrazek, and H. Jin, Online collaborative data caching in edge computing, IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 2, pp. 281–294, 2021.

Crossref Google Scholar

[18]

L. Yuan, Q. He, F. Chen, J. Zhang, L. Qi, X. Xu, Y. Xiang, and Y. Yang, CSEdge: Enabling collaborative edge storage for multi-access edge computing based on blockchain, IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 8, pp. 1873–1887, 2022.

Crossref Google Scholar

[19]

B. Liu, L. Wang, and M. Liu, Lifelong federated reinforcement learning: A learning architecture for navigation in cloud robotic systems, IEEE Robot. Autom. Lett., vol. 4, no. 4, pp. 4555–4562, 2019.

Crossref Google Scholar

[20]

X. Liang, Y. Liu, T. Chen, M. Liu, and Q. Yang, Federated transfer reinforcement learning for autonomous driving, in Federated and Transfer Learning, R. Razavi-Far, B. Wang, M. E. Taylor, and Q. Yang, eds. Cham, Switzerland: Springer, 2023, pp. 357–371.

Crossref

[21]

C. Nadiger, A. Kumar, and S. Abdelhak, Federated reinforcement learning for fast personalization, in Proc. 2019 IEEE Second Int. Conf. Artificial Intelligence and Knowledge Engineering (AIKE), Sardinia, Italy, 2019, pp. 123–127.

Crossref

[22]

P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., Advances and open problems in federated learning, Found. Trends® Mach. Learn., vol. 14, nos. 1&2, pp. 1–210, 2021.

Crossref Google Scholar

[23]

T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, Federated optimization in heterogeneous networks, arXiv preprint arXiv: 1812.06127, 2018.

[24]

Q. Li, Y. Diao, Q. Chen, and B. He, Federated learning on non-IID data silos: An experimental study, in Proc. 2022 IEEE 38th Int. Conf. Data Engineering (ICDE), Kuala Lumpur, Malaysia, 2022, pp. 965–978.

Crossref

[25]

V. Smith, C. K. Chiang, M. Sanjabi, and A. Talwalkar, Federated multi-task learning, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 4427–4437.

[26]

J. Mills, J. Hu, and G. Min, Multi-task federated learning for personalised deep neural networks in edge computing, IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 3, pp. 630–641, 2022.

Crossref Google Scholar

[27]

C. Finn, P. Abbeel, and S. Levine, Model-agnostic meta-learning for fast adaptation of deep networks, in Proc. 34th Int. Conf. Machine Learning - Volume 70, Sydney, Australia, 2017, pp. 1126–1135.

[28]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv: 1707.06347, 2017.

[29]

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.

[30]

S. Reddi, Z. Charles, M. Zaheer, Z. Garrett, K. Rush, J. Konečný, S. Kumar, and H. B. McMahan, Adaptive federated optimization, arXiv preprint arXiv: 2003.00295, 2020.

[31]

C. Y. Chen, J. Ni, S. Lu, X. Cui, P. Y. Chen, X. Sun, N. Wang, S. Venkataramani, V. Srinivasan, W. Zhang, et al., ScaleCom: Scalable sparsified gradient compression for communication-efficient distributed training, in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 13551–13563.

[32]

Y. Mansour, M. Mohri, J. Ro, and A. T. Suresh, Three approaches for personalization with applications to federated learning, arXiv preprint arXiv: 2002.10619, 2020.

[33]

M. Zhang, K. Sapra, S. Fidler, S. Yeung, and J. M. Alvarez, Personalized federated learning with first order model optimization, arXiv preprint arXiv: 2012.08565, 2020.

[34]

A. Z. Tan, H. Yu, L. Cui, and Q. Yang, Towards personalized federated learning, IEEE Trans. Neural Netw. Learn. Syst. doi: 10.1109/TNNLS.2022.3160699.

Crossref

[35]

F. Hanzely and P. Richtárik, Federated learning of a mixture of global and local models, arXiv preprint arXiv: 2002.05516, 2020.

[36]

Y. Deng, M. M. Kamani, and M. Mahdavi, Adaptive personalized federated learning, arXiv preprint arXiv: 2003.13461, 2020.

[37]

L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, Exploiting shared representations for personalized federated learning, arXiv preprint arXiv: 2102.07078, 2021.

[38]

C. T. Dinh, N. H. Tran, and T. D. Nguyen, Personalized federated learning with Moreau envelopes, in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 21394–21405.

[39]

Y. T. Huang, L. Y. Chu, Z. R. Zhou, L. J. Wang, J. C. Liu, J. Pei, and Y. Zhang, Personalized cross-silo federated learning on non-IID data, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 9, pp. 7865–7873, 2021.

Crossref Google Scholar

[40]

T. Li, S. Hu, A. Beirami, and V. Smith, Ditto: Fair and robust federated learning through personalization, arXiv preprint arXiv: 2012.04221, 2020.

[41]

M. Khodak, M. F. Balcan, and A. Talwalkar, Adaptive gradient-based meta-learning methods, arXiv preprint arXiv: 1906.02717, 2019.

[42]

A. Fallah, A. Mokhtari, and A. Ozdaglar, Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach, in Proc. 34th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 3557–3568.

[43]

D. A. E. Acar, Y. Zhao, R. Zhu, R. Matas, M. Mattina, P. Whatmough, and V. Saligrama, Debiasing model updates for improving personalized federated training, presented at 38th Int. Conf. Machine Learning, Virtual Event, 2021.

[44]

A. Fallah, A. Mokhtari, and A. Ozdaglar, On the convergence theory of gradient-based model-agnostic meta-learning algorithms, arXiv preprint arXiv: 1908.10400, 2019.

[45]

R. S. Sutton and A. G. Barto, Reinforcement Learning : An Introduction. Cambridge, MA, USA: MIT Press, 2018.

[46]

S. T. Tokdar and R. E. Kass, Importance sampling: A review, Wires Comput. Stat., vol. 2, no. 1, pp. 54–60, 2010.

Crossref Google Scholar

[47]

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel, High-dimensional continuous control using generalized advantage estimation, arXiv preprint arXiv: 1506.02438, 2015.

[48]

E. Todorov, T. Erez, and Y. Tassa, MuJoCo: A physics engine for model-based control, in Proc. 2012 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 2012, pp. 5026–5033.

Crossref

Tsinghua Science and Technology

Volume 29 Issue 3,
June 2024

Pages 911-926

DOI: 10.26599/TST.2023.9010066

Cite this article:

Liu W, Xu X, Wu J, et al. Federated Meta Reinforcement Learning for Personalized Tasks. Tsinghua Science and Technology, 2024, 29(3): 911-926. https://doi.org/10.26599/TST.2023.9010066

490

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 03 April 2023

Revised: 26 June 2023

Accepted: 27 June 2023

Published: 04 December 2023

The articles published in this open access journal are distributed under the terms of theCreative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).