Journal Home > Volume 1 , Issue 3

Nowadays, Edge Information System (EIS) has received a lot of attentions. In EIS, Distributed Machine Learning (DML), which requires fewer computing resources, can implement many artificial intelligent applications efficiently. However, due to the dynamical network topology and the fluctuating transmission quality at the edge, work node selection affects the performance of DML a lot. In this paper, we focus on the Internet of Vehicles (IoV), one of the typical scenarios of EIS, and consider the DML-based High Definition (HD) mapping and intelligent driving decision model as the example. The worker selection problem is modeled as a Markov Decision Process (MDP), maximizing the DML model aggregate performance related to the timeliness of the local model, the transmission quality of model parameters uploading, and the effective sensing area of the worker. A Deep Reinforcement Learning (DRL) based solution is proposed, called the Worker Selection based on Policy Gradient (PG-WS) algorithm. The policy mapping from the system state to the worker selection action is represented by a deep neural network. The episodic simulations are built and the REINFORCE algorithm with baseline is used to train the policy network. Results show that the proposed PG-WS algorithm outperforms other comparation methods.


menu
Abstract
Full text
Outline
About this article

Deep reinforcement learning based worker selection for distributed machine learning enhanced edge intelligence in internet of vehicles

Show Author's information Junyu DongWenjun Wu*( )Yang GaoXiaoxi WangPengbo Si
Faculty of Information Technology, Beijing University of Technology, Beijing 100022, China

Abstract

Nowadays, Edge Information System (EIS) has received a lot of attentions. In EIS, Distributed Machine Learning (DML), which requires fewer computing resources, can implement many artificial intelligent applications efficiently. However, due to the dynamical network topology and the fluctuating transmission quality at the edge, work node selection affects the performance of DML a lot. In this paper, we focus on the Internet of Vehicles (IoV), one of the typical scenarios of EIS, and consider the DML-based High Definition (HD) mapping and intelligent driving decision model as the example. The worker selection problem is modeled as a Markov Decision Process (MDP), maximizing the DML model aggregate performance related to the timeliness of the local model, the transmission quality of model parameters uploading, and the effective sensing area of the worker. A Deep Reinforcement Learning (DRL) based solution is proposed, called the Worker Selection based on Policy Gradient (PG-WS) algorithm. The policy mapping from the system state to the worker selection action is represented by a deep neural network. The episodic simulations are built and the REINFORCE algorithm with baseline is used to train the policy network. Results show that the proposed PG-WS algorithm outperforms other comparation methods.

Keywords: deep reinforcement learning, edge information system, internet of vehicles, distributed machine learning, worker selection

References(15)

[1]
J. Zhang and K. B. Letaief, Mobile edge intelligence and computing for the internet of vehicles, Proc. IEEE, vol. 108, no. 2, pp. 246-261, 2020.
[2]
W. C. Xu, H. B. Zhou, N. Cheng, F. Lv, W. S. Shi, J. Y. Chen, and X. M. Shen, Internet of vehicles in big data era, IEEE/CAA J. Autom. Sin., vol. 5, no. 1, pp. 19-35, 2018.
[3]
[4]
HERE introduces HD live map to show the path to highly automated driving, https://360.here.com/2016/01/05/here-introduces-hd-live-map-to-show-the-path-to-highly-automated-driving/, 2016.
[5]
P. F. Alcantarilla, S. Stent, G. Ros, R. Arroyo, and R. Gherardi, Street-view change detection with deconvolutional networks, Auto. Robots, vol. 42, no. 7, pp. 1301-1322, 2018.
[6]
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Arcas, Communication-efficient learning of deep networks from decentralized data, arXiv preprint arXiv: 1602.05629, 2017.
[7]
J. M. Chen, X. H. Pan, R. Monga, S. Bengio, and R. Jozefowicz, Revisiting distributed synchronous SGD, arXiv preprint arXiv: 1604.00981, 2016.
[8]
S. Q. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, When edge meets learning: Adaptive control for resource-constrained distributed machine learning, presented at IEEE INFOCOM 2018-IEEE Conf. Computer Communications, Honolulu, HI, USA, 2018, pp. 63-71.
DOI
[9]
R. Zhang, F. R. Yu, J. Liu, T. Huang, and Y. J. Liu, Deep reinforcement learning (DRL)-based Device-to-Device (D2D) caching with blockchain and mobile edge computing, IEEE Trans. Wireless Comm., vol. 19, no. 10, pp. 6469-6485, 2020.
[10]
Y. Gao, W. J. Wu, H. X. Nan, Y. Sun, and P. B. Si, Deep reinforcement learning based task scheduling in mobile Blockchain for IoT applications, presented at ICC 2020-2020 IEEE Int. Conf. Communications (ICC), Dublin, Ireland, 2020, pp. 1-7.
DOI
[11]
M. Li, F. R. Yu, P. B. Si, W. J. Wu, and Y. H. Zhang, Resource optimization for delay-tolerant data in blockchain-enabled iot with edge computing: A deep reinforcement learning approach, IEEE Int. Things J., vol. 7, no. 10, pp. 9399-9412, 2020.
[12]
M. Mozaffari, W. Saad, M. Bennis, and M. Debbah, Mobile unmanned aerial vehicles (UAVs) for energy-efficient internet of things communications, IEEE Trans. Wirel. Comm., vol. 16, no. 11, pp. 7574-7589, 2017.
[13]
H. Liu, S. W. Liu, and K. Zheng, A reinforcement learning-based resource allocation scheme for cloud robotics, IEEE Access, vol. 6, pp. 17 215-17 222, 2018.
[14]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 1998.
DOI
[15]
Enhancement of 3GPP Support for V2X Scenarios, 3GPP TS 22.186, 2019.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 31 July 2020
Revised: 28 September 2020
Accepted: 11 November 2020
Published: 30 December 2020
Issue date: December 2020

Copyright

© All articles included in the journal are copyrighted to the ITU and TUP 2020

Acknowledgements

This work was supported by the Science and Technology Foundation of Beijing Municipal Commission of Education (No. KM201810005027), the National Natural Science Foundation of China (No. U1633115), and the Beijing Natural Science Foundation (No. L192002).

Rights and permissions

© All articles included in the journal are copyrighted to the ITU and TUP. This work is available under the CC BY-NC-ND 3.0 IGO license: https://creativecommons.org/licenses/by-nc-nd/3.0/igo/.

Return