Journal Home > Volume 26 , Issue 5

Navigation is a fundamental problem of mobile robots, for which Deep Reinforcement Learning (DRL) has received significant attention because of its strong representation and experience learning abilities. There is a growing trend of applying DRL to mobile robot navigation. In this paper, we review DRL methods and DRL-based navigation frameworks. Then we systematically compare and analyze the relationship and differences between four typical application scenarios: local obstacle avoidance, indoor navigation, multi-robot navigation, and social navigation. Next, we describe the development of DRL-based navigation. Last, we discuss the challenges and some possible solutions regarding DRL-based navigation.


menu
Abstract
Full text
Outline
About this article

Deep Reinforcement Learning Based Mobile Robot Navigation: A Review

Show Author's information Kai ZhuTao Zhang( )
Department of Automation, Tsinghua University, Beijing 100084, China
Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China

Abstract

Navigation is a fundamental problem of mobile robots, for which Deep Reinforcement Learning (DRL) has received significant attention because of its strong representation and experience learning abilities. There is a growing trend of applying DRL to mobile robot navigation. In this paper, we review DRL methods and DRL-based navigation frameworks. Then we systematically compare and analyze the relationship and differences between four typical application scenarios: local obstacle avoidance, indoor navigation, multi-robot navigation, and social navigation. Next, we describe the development of DRL-based navigation. Last, we discuss the challenges and some possible solutions regarding DRL-based navigation.

Keywords:

mobile robot navigation, obstacle avoidance, deep reinforcement learning
Received: 05 February 2021 Accepted: 22 February 2021 Published: 20 April 2021 Issue date: October 2021
References(86)
[1]
W. Rone and P. Ben-Tzvi, Mapping, localization and motion planning in mobile multi-robotic systems, Robotica, vol. 31, no. 1, pp. 1–23, 2013.
[2]
J. Engel, T. Schöps, and D. Cremers, LSD-SLAM: Large-scale direct monocular SLAM, in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, eds. Zurich, Switzerland: Springer International Publishing, 2014, pp. 834–849.
[3]
R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós, ORB-SLAM: A versatile and accurate monocular SLAM system, IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
[4]
G. Grisetti, C. Stachniss, and W. Burgard, Improved techniques for grid mapping with rao-blackwellized particle filters, IEEE Transactions on Robotics, vol. 23, no. 1, pp. 34–46, 2007.
[5]
S. Kohlbrecher, O. von Stryk, J. Meyer, and U. Klingauf, A flexible and scalable SLAM system with full 3D motion estimation, presented at 2011 IEEE Int. Symp. Safety, Security, and Rescue Robotics, Kyoto, Japan, 2011, pp. 155–160.
[6]
M. Elbanhawi and M. Simic, Sampling-based robot motion planning: A review, IEEE Access, vol. 2, pp. 56–77, 2014.
[7]
D. Fox, W. Burgard, and S. Thrun, The dynamic window approach to collision avoidance, IEEE Robotics & Automation Magazine, vol. 4, no. 1, pp. 23–33, 1997.
[8]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing atari with deep reinforcement learning, arXiv preprint arXiv:1312.5602, 2013.
[9]
A. Banino, C. Barry, B. Uria, C. Blundell, T. Lillicrap, P. Mirowski, A. Pritzel, M. J. Chadwick, T. Degris, J. Modayil, et al., Vector-based navigation using grid-like representations in artificial agents, Nature, vol. 557, no. 7705, pp. 429–433, 2018.
[10]
A. Pandey, S. Pandey, and D. R. Parhi, Mobile robot navigation and obstacle avoidance techniques: A review, International Robotics & Automation Journal, vol. 2, no. 3, pp. 96–105, 2017.
[11]
F. Kamil, S. H. Tang, W. Khaksar, N Zulkifli, and S. A. Ahmad, A review on motion planning and obstacle avoidance approaches in dynamic environments, Advances in Robotics & Automation, vol. 4, no. 2, p. 1000134, 2015.
[12]
T. T. Nguyen, N. D. Nguyen, and S. Nahavandi, Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications, IEEE Transactions on Cybernetics, vol. 50, no. 9, pp. 3826–3839, 2020.
[13]
F. Y. Zeng, C. Wang, and S. S. Ge, A survey on visual navigation for artificial agents with deep reinforcement learning, IEEE Access, vol. 8, pp. 135 426–135 442, 2020.
[14]
C. J. C. H. Watkins, Learning from delayed rewards, PhD dissertation, University of Cambridge, Cambridge, England, 1989.
[15]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[16]
H. Van Hasselt, A. Guez, and D. Silver, Deep reinforcement learning with double Q-learning, in Proc. 30th AAAI Conf. Artificial Intelligence, Phoenix, AZ, USA, 2016, pp. 2094–2100.
[17]
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971, 2015.
[18]
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, arXiv preprint arXiv:1602.01783, 2016.
[19]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv:1707.06347, 2017.
[20]
Q. Shi, S. Zhao, X. W. Cui, M. Q. Lu, and M. D. Jia, Anchor self-localization algorithm based on UWB ranging and inertial measurements, Tsinghua Science and Technology, vol. 24, no. 6, pp. 728–737, 2019.
[21]
A. Faust, K. Oslund, O. Ramirez, A. Francis, L. Tapia, M. Fiser, and J. Davidson, PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning, in Proc. 2018 IEEE Int. Conf. Robotics and Automation, Brisbane, Australia, 2018, pp. 5113–5120.
[22]
M. Duguleana and G. Mogan, Neural networks based reinforcement learning for mobile robots obstacle avoidance, Expert Systems with Applications, vol. 62, pp. 104–115, 2016.
[23]
S. M. Feng, H. L. Ren, X. R. Wang, and P. Ben-Tzvi, and Asme, Mobile robot obstacle avoidance based on deep reinforcement learning, in Proc. ASME 2019 Int. Design Engineering Technical Conferences and Computers and Information in Engineering Conf., Anaheim, CA, USA, 2019.
[24]
Y. Kato, K. Kamiyama, and K. Morioka, Autonomous robot navigation system with learning based on deep Q-network and topological maps, in Proc. 2017 IEEE/SICE Int. Symp. System Integration, Taipei, China, 2017, pp. 1040–1046.
[25]
Y. Kato and K. Morioka, Autonomous robot navigation system without grid maps based on double deep Q-network and RTK-GNSS localization in outdoor environments, in Proc. 2019 IEEE/SICE Int. Symp. System Integration, Paris, France, 2019, pp. 346–351.
[26]
C. Wang, J. Wang, X. D. Zhang, and X. Zhang, Autonomous navigation of UAV in large-scale unknown complex environment with deep reinforcement learning, in Proc. 2017 IEEE Global Conf. Signal and Information Processing, Montreal, Canada, 2017, pp. 858–862.
[27]
C. Wang, J. Wang, Y. Shen, and X. D. Zhang, Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach, IEEE Trans. Veh. Technol., vol. 68, no. 3, pp. 2124–2136, 2019.
[28]
Z. W. Ma, C. Wang, Y. F. Niu, X. K. Wang, and L. C. Shen, A saliency-based reinforcement learning approach for a UAV to avoid flying obstacles, Robotics and Autonomous Systems, vol. 100, pp. 108–118, 2018.
[29]
J. Woo and N. Kim, Collision avoidance for an unmanned surface vehicle using deep reinforcement learning, Ocean Eng., vol. 199, p. 107001, 2020.
[30]
X. Wu, H. L. Chen, C. G. Chen, M. Y. Zhong, S. R. Xie, Y. K. Guo, and H. Fujita, The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method, Knowl.-Based Syst., vol. 196, p. 105201, 2020.
[31]
X. Y. Zhang, C. B. Wang, Y. C. Liu, and X. Chen, Decision-making for the autonomous navigation of maritime autonomous surface ships based on scene division and deep reinforcement learning, Sensors, vol. 19, no. 18, p. 4055, 2019.
[32]
G. Kahn, A. Villaflor, V. Pong, P. Abbeel, and S. Levine, Uncertainty-aware reinforcement learning for collision avoidance, arXiv preprint arXiv:1702.01182, 2017.
[33]
H. B. Shi, L. Shi, M. Xu, and K. S. Hwang, End-to-end navigation strategy with deep reinforcement learning for mobile robots, IEEE Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2393–2402, 2020.
[34]
L. Tai, G. Paolo, and M. Liu, Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation, in Proc. 2017 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vancouver, Canada, 2017, pp. 31–36.
[35]
K. Yokoyama and K. Morioka, Autonomous mobile robot with simple navigation system based on deep reinforcement learning and a monocular camera, in Proc. 2020 IEEE/SICE Int. Symp. System Integration, Honolulu, HI, USA, 2020, pp. 525–530.
[36]
J. W. Zhang, J. T. Springenberg, J. Boedecker, and W. Burgard, Deep reinforcement learning with successor features for navigation across similar environments, in Proc. 2017 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vancouver, Canada, 2017, pp. 2371–2378.
[37]
X. Y. Lei, Z. Zhang, and P. F. Dong, Dynamic path planning of unknown environment based on deep reinforcement learning, Journal of Robotics, vol. 2018, p. 5781591, 2018.
[38]
M. Pfeiffer, S. Shukla, M. Turchetta, C. Cadena, A. Krause, R. Siegwart, and J. Nieto, Reinforced imitation: Sample efficient deep reinforcement learning for mapless navigation by leveraging prior demonstrations, IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 4423–4430, 2018.
[39]
C. Sampedro, H. Bavle, A. Rodriguez-Ramos, P. de la Puente, and P. Campoy, Laser-based reactive navigation for multirotor aerial robots using deep reinforcement learning, in Proc. 2018 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Madrid, Spain, 2018, pp. 1024–1031.
[40]
C. Wang, J. Wang, J. J. Wang, and X. D. Zhang, Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards, IEEE Internet of Things Journal, vol. 7, no. 7, pp. 6180–6190, 2020.
[41]
F. Aznar, M. Pujol, and R. Rizo, Obtaining fault tolerance avoidance behavior using deep reinforcement learning, Neurocomputing, vol. 345, pp. 77–91, 2019.
[42]
J. Choi, K. Park, M. Kim, and S. Seok, Deep reinforcement learning of navigation in a complex and crowded environment with a limited field of view, in Proc. 2019 Int. Conf. Robotics and Automation, Montreal, Canada, 2019, pp. 5993–6000.
[43]
F. Leiva and J. Ruiz-del-Solar, Robust RL-based map-less local planning: Using 2D point clouds as observations, IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 5787–5794, 2020.
[44]
P. Mirowski, R. Pascanu, F. Viola, H. Soyer, A. J. Ballard, A. Banino, M. Denil, R. Goroshin, L. Sifre, K. Kavukcuoglu, et al., Learning to navigate in complex environments, arXiv preprint arXiv:1611.03673, 2017.
[45]
Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, and A. Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, in Proc. 2017 IEEE Int. Conf. Robotics and Automation, Singapore, 2017, pp. 3357–3364.
[46]
J. Oh, V. Chockalingam, S. Singh, and H. Lee, Control of memory, active perception, and action in minecraft, arXiv preprint arXiv:1605.09128, 2016.
[47]
G. Brunner, O. Richter, Y. Y. Wang, and R. Wattenhofer, Teaching a machine to read maps with deep reinforcement learning, arXiv preprint arXiv:1711.07479, 2017.
[48]
Y. Wu, Y. X. Wu, G. Gkioxari, and Y. D. Tian, Building generalizable agents with a realistic and rich 3D environment, arXiv preprint arXiv:1801.02209, 2018.
[49]
S. R. Song, F. Yu, A. Zeng, A. X. Chang, M. Savva, and T. Funkhouser, Semantic scene completion from a single depth image, arXiv preprint arXiv:1611.08974, 2016.
[50]
F. Y. Zeng and C. Wang, Visual navigation with asynchronous proximal policy optimization in artificial agents, Journal of Robotics, vol. 2020, p. 8702962, 2020.
[51]
J. Kulhánek, E. Derner, T. de Bruin, and R. Babuška, Vision-based navigation using deep reinforcement learning, arXiv preprint arXiv:1908.03627, 2019.
[52]
A. Devo, G. Mezzetti, G. Costante, M. L. Fravolini, and P. Valigi, Towards generalization in target-driven visual navigation by using deep reinforcement learning, IEEE Transactions on Robotics, vol. 36, no. 5, pp. 1546–1561, 2020.
[53]
A. Devo, G. Costante, and P. Valigi, Deep reinforcement learning for instruction following visual navigation in 3D maze-like environments, IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1175–1182, 2020.
[54]
S. H. Hsu, S. H. Chan, P. T. Wu, K. Xiao, and L. C. Fu, Distributed deep reinforcement learning based indoor visual navigation, in Proc. 2018 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Madrid, Spain, 2018, pp. 2532–2537.
[55]
A. Staroverov, D. A. Yudin, I. Belkin, V. Adeshkin, Y. K. Solomentsev, and A. I. Panov, Real-time object navigation with deep neural networks and hierarchical reinforcement learning, IEEE Access, vol. 8, pp. 195 608–195 621, 2020.
[56]
Y. Lu, Y. R. Chen, D. B. Zhao, and D. Li, MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation, Neurocomputing, vol. 421, pp. 140–150, 2021.
[57]
Z. Fan, G. S. Pereira, and V. Kumar, Cooperative localization and tracking in distributed robot-sensor networks, Tsinghua Science and Technology, vol. 10, no. 1, pp. 91–101, 2005.
[58]
Y. F. Chen, M. Liu, M. Everett, and J. P. How, Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning, arXiv preprint arXiv:1609.07845, 2016.
[59]
W. H. Ding, S. J. Li, H. H. Qian, and Y. Q. Chen, Hierarchical reinforcement learning framework towards multi-agent navigation, in Proc. 2018 IEEE Int. Conf. Robotics and Biomimetics, Kuala Lumpur, Malaysia, 2018, pp. 237–242.
[60]
P. X. Long, T. X. Fan, X. Y. Liao, W. X. Liu, H. Zhang, and J. Pan, Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning, arXiv preprint arXiv:1709.10082, 2018.
[61]
T. X. Fan, P. X. Long, W. X. Liu, and J. Pan, Fully distributed multi-robot collision avoidance via deep reinforcement learning for safe and efficient navigation in complex scenarios, arXiv preprint arXiv:1808.03841, 2018.
[62]
T. X. Fan, P. X. Long, W. X. Liu, and J. Pan, Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios, Int. J. Robot. Res., vol. 39, no. 7, pp. 856–892, 2020.
[63]
W. Z. Chen, S. Z. Zhou, Z. S. Pan, H. X. Zheng, and Y. Liu, Mapless collaborative navigation for a multi-robot system based on the deep reinforcement learning, Applied Sciences, vol. 9, no. 20, p. 4198, 2019.
[64]
J. T. Lin, X. Y. Yang, P. W. Zheng, and H. Cheng, End-to-end decentralized multi-robot navigation in unknown complex environments via deep reinforcement learning, in Proc. 2019 IEEE Int. Conf. Mechatronics and Automation, Tianjin, China, 2019, pp. 2493–2500.
[65]
G. Sartoretti, J. Kerr, Y. F. Shi, G. Wagner, T. K. S. Kumar, S. Koenig, and H. Choset, PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning, IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2378–2385, 2019.
[66]
J. C. Ma, H. M. Lu, J. H. Xiao, Z. W. Zeng, and Z. Q. Zheng, Multi-robot target encirclement control with collision avoidance via deep reinforcement learning, Journal of Intelligent & Robotic Systems, vol. 99, no. 2, pp. 371–386, 2020.
[67]
Y. F. Chen, M. Everett, M. Liu, and J. P. How, Socially aware motion planning with deep reinforcement learning, in Proc. 2017 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vancouver, Canada, 2017, pp. 1343–1350.
[68]
L. Chen, N. Ma, P. Wang, J. H. Li, P. F. Wang, G. L. Pang, and X. J. Shi, Survey of pedestrian action recognition techniques for autonomous driving, Tsinghua Science and Technology, vol. 25, no. 4, pp. 458–470, 2020.
[69]
M. Everett, Y. F. Chen, and J. P. How, Motion planning among dynamic, decision-making agents with deep reinforcement learning, in Proc. 2018 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Madrid, Spain, 2018, pp. 3052–3059.
[70]
P. H. Ciou, Y. T. Hsiao, Z. Z. Wu, S. H. Tseng, and L. C. Fu, Composite reinforcement learning for social robot navigation, in Proc. 2018 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Madrid, Spain, 2018, pp. 2553–2558.
[71]
L. B. Sun, J. F. Zhai, and W. H. Qin, Crowd navigation in an unknown and dynamic environment based on deep reinforcement learning, IEEE Access, vol. 7, pp. 109 544–109 554, 2019.
[72]
Y. Sasaki, S. Matsuo, A. Kanezaki, and H. Takemura, A3C based motion learning for an autonomous mobile robot in crowds, in Proc. 2019 IEEE Int. Conf. Systems, Man and Cybernetics, Bari, Italy, 2019, pp. 1036–1042.
[73]
A. J. Sathyamoorthy, U. Patel, T. Guan, and D. Manocha, Frozone: Freezing-free, pedestrian-friendly navigation in human crowds, IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4352–4359, 2020.
[74]
Y. Y. Chen, C. C. Liu, B. E. Shi, and M. Liu, Robot navigation in crowds by graph convolutional networks with attention learned from human gaze, IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2754–2761, 2020.
[75]
H. T. L. Chiang, A. Faust, M. Fiser, and A. Francis, Learning navigation behaviors end-to-end with AutoRL, IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2007–2014, 2019.
[76]
G. D. Chen, S. Y. Yao, J. Ma, L. F. Pan, Y. A. Chen, P. Xu, J. M. Ji, and X. P. Chen, Distributed non-communicating multi-robot collision avoidance via map-based deep reinforcement learning, Sensors, vol. 20, no. 17, p. 4836, 2020.
[77]
V. J. Hodge, R. Hawkins, and R. Alexander, Deep reinforcement learning for drone navigation using sensor data, Neural Computing and Applications, vol. 33, no. 6, pp. 2015–2033, 2021.
[78]
Y. D. Wang, H. B. He, and C. Y. Sun, Learning to navigate through complex dynamic environment with modular deep reinforcement learning, IEEE Transactions on Games, vol. 10, no. 4, pp. 400–412, 2018.
[79]
E. Parisotto and R. Salakhutdinov, Neural map: Structured memory for deep reinforcement learning, arXiv preprint arXiv:1702.08360, 2017.
[80]
J. J. Zeng, R. S. Ju, L. Qin, Y. Hu, Q. J. Yin, and C. Hu, Navigation in unknown dynamic environments based on deep reinforcement learning, Sensors, vol. 19, no. 18, p. 3837, 2019.
[81]
K. Lobos-Tsunekawa, F. Leiva, and J. Ruiz-del-Solar, Visual navigation for biped humanoid robots using deep reinforcement learning, IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3247–3254, 2018.
[82]
Q. C. Zhang, M. Q. Zhu, L. Zou, M. Li, and Y. Zhang, Learning reward function with matching network for mapless navigation, Sensors, vol. 20, no. 13, p. 3664, 2020.
[83]
A. Hussein, E. Elyan, M. M. Gaber, and C. Jayne, Deep imitation learning for 3D navigation tasks, Neural Computing and Applications, vol. 29, no. 7, pp. 389–404, 2018.
[84]
M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, and K. Kavukcuoglu, Reinforcement learning with unsupervised auxiliary tasks, arXiv preprint arXiv:1611.05397, 2016.
[85]
P. Mirowski, M. K. Grimes, M. Malinowski, K. M. Hermann, K. Anderson, D. Teplyashin, K. Simonyan, K. Kavukcuoglu, A. Zisserman, and R. Hadsell, Learning to navigate in cities without a map, arXiv preprint arXiv:1804.00168, 2019.
[86]
D. W. Wang, T. X. Fan, T. Han, and J. Pan, A two-stage reinforcement learning approach for multi-UAV collision avoidance under imperfect sensing, IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3098–3105, 2020.
Publication history
Copyright
Rights and permissions

Publication history

Received: 05 February 2021
Accepted: 22 February 2021
Published: 20 April 2021
Issue date: October 2021

Copyright

© The author(s) 2021

Rights and permissions

© The author(s) 2021. The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return