AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (5.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Reset-Free Reinforcement Learning via Multi-State Recovery and Failure Prevention for Autonomous Robots

Xu Zhou1,2Benlian Xu3( )Zhengqiang Jiang4Jun Li2Brett Nener5
School of Mechanical Engineering, Changshu Institute of Technology, Changshu 215500, China
School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China
School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
Faculty of Medicine and Health, The University of Sydney, Sydney 2006, Australia
Department of Electrical, Electronic and Computer Engineering, The University of Western Australia, Perth 6009, Australia
Show Author Information

Abstract

Reinforcement learning holds promise in enabling robotic tasks as it can learn optimal policies via trial and error. However, the practical deployment of reinforcement learning usually requires human intervention to provide episodic resets when a failure occurs. Since manual resets are generally unavailable in autonomous robots, we propose a reset-free reinforcement learning algorithm based on multi-state recovery and failure prevention to avoid failure-induced resets. The multi-state recovery provides robots with the capability of recovering from failures by self-correcting its behavior in the problematic state and, more importantly, deciding which previous state is the best to return to for efficient re-learning. The failure prevention reduces potential failures by predicting and excluding possible unsafe actions in specific states. Both simulations and real-world experiments are used to validate our algorithm with the results showing a significant reduction in the number of resets and failures during the learning.

References

[1]
W. Han, S. Levine, and P. Abbeel, Learning compound multi-step controllers under unknown dynamics, in Proc. 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ), Hamburg, Germany, 2015, pp. 6435−6442.
[2]
B. Eysenbach, S. Gu, J. Ibarz, and S. Levine, Leave no Trace: Learning to reset for safe and autonomous reinforcement learning, presented at 6th International Conference on Learning Representations (ICLR), Vancouver, Canada, 2017.
[3]
A. Gupta, J. Yu, T. Zhao, V. Kumar, A. Rovinsky, K. Xu, T. Devlin, and S. Levine, Reset-free reinforcement learning via multi-task learning: Learning dexterous manipulation behaviors without human intervention, in Proc. 2021 International Conference on Robotics and Automation (ICRA ), Xi’an, China, 2021, pp. 6664−6671.
[4]

V. Verma, G. Gordon, R. Simmons, and S. Thrun, Real-time fault diagnosis [robot fault diagnosis], IEEE Robot. Autom. Mag., vol. 11, no. 2, pp. 56–66, 2004.

[5]

S. Lengagne, J. Vaillant, E. Yoshida, and A. Kheddar, Generation of whole-body optimal dynamic multi-contact motions, Int. J. Robot. Res., vol. 32, nos. 9&10, pp. 1104–1119, 2013.

[6]
V. Vonásek, S. Neumann, D. Oertel, and H. Wörn, Online motion planning for failure recovery of modular robotic systems, in Proc. 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 2015, pp. 1905−1910.
[7]

K. Chatzilygeroudis, V. Vassiliades, and J. B. Mouret, Reset-free trial-and-error learning for robot damage recovery, Robot. Auton. Syst., vol. 100, pp. 236–250, 2018.

[8]

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, Int. J. Robot. Res., vol. 32, no. 11, pp. 1238–1274, 2013.

[9]
G. Kahn, A. Villaflor, B. Ding, P. Abbeel, and S. Levine, Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation, in Proc. 2018 IEEE International Conference on Robotics and Automation (ICRA ), Brisbane, Australia, 2018, pp. 5129−5136.
[10]

K. Zhu and T. Zhang, Deep reinforcement learning based mobile robot navigation: A review, Tsinghua Science and Technology, vol. 26, no. 5, pp. 674–691, 2021.

[11]

L. Wang, Z. Pan, and J. Wang, A review of reinforcement learning based intelligent optimization for manufacturing scheduling, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 257–270, 2021.

[12]

X. Wang, L. Wang, C. Dong, H. Ren, and K. Xing, Reinforcement Learning-Based Dynamic Order Recommendation for On-Demand Food Delivery, Tsinghua Science and Technology, vol. 29, no. 2, pp. 356–367, 2024.

[13]
M. Deisenroth and C. E. Rasmussen, PILCO: A model-based and data-efficient approach to policy search, in Proc. 28th International Conference on Machine Learning (ICML), Bellevue, WA, USA, 2011, pp. 465−472.
[14]
J. Peters and S. Schaal, Policy gradient methods for robotics, in Proc. 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS ), Beijing, China, 2006, pp. 2219−2225.
[15]

J. Garcıa and F. Fernandez, A comprehensive survey on safe reinforcement learning, J. Mach. Learn. Res., vol. 16, no. 1, pp. 1437–1480, 2015.

[16]

P. Geibel and F. Wysotzki, Risk-sensitive reinforcement learning applied to control under constraints, J. Artif. Intell. Res., vol. 24, pp. 81–108, 2005.

[17]

Y. Kadota, M. Kurano, and M. Yasuda, Discounted Markov decision processes with utility constraints, Comput. Math. Appl., vol. 51, no. 2, pp. 279–284, 2006.

[18]
T. Morimura, M. Sugiyama, H. Kashima, H. Hachiya, and T. Tanaka, Nonparametric return distribution approximation for reinforcement learning, presented at the 27th International Conference on Machine Learning (ICML), Haifa, Israel, 2010.
[19]
T. M. Moldovan and P. Abbeel, Safe exploration in Markov decision processes, presented at the 29th International Conference on International Conference on Machine Learning (ICML), Edinburgh, UK, 2012.
[20]

K. Driessens and S. Džeroski, Integrating guidance into relational reinforcement learning, Mach. Learn., vol. 57, no. 3, pp. 271–304, 2004.

[21]

P. Abbeel, A. Coates, and A. Y. Ng, Autonomous helicopter aerobatics through apprenticeship learning, Int. J. Robot. Res., vol. 29, no. 13, pp. 1608–1639, 2010.

[22]
J. Tang, A. Singh, N. Goehausen, and P. Abbeel, Parameterized maneuver learning for autonomous helicopter flight, in Proc. 2010 IEEE International Conference on Robotics and Automation (ICRA ), Anchorage, AK, USA, 2010, pp. 1142–1148.
[23]
C. Gehring and D. Precup, Smart exploration in reinforcement learning using absolute temporal difference errors, presented at the 12th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), St. Paul, MN, USA, 2013.
[24]

J. Morimoto and K. Doya, Robust reinforcement learning, Neural Comput., vol. 17, no. 2, pp. 335–359, 2005.

[25]
L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, Robust adversarial reinforcement learning, in Proc. 34th International Conference on Machine Learning (ICML ), Sydney, Australia, 2017, pp. 2817–2826.
[26]

Y. Li, Y. Tian, E. Tong, W. Niu, Y. Xiang, T. Chen, Y. Wu, and J. Liu, Curricular robust reinforcement learning via GAN-based perturbation through continuously scheduled task sequence, Tsinghua Science and Technology, vol. 28, no. 1, pp. 27–38, 2023.

[27]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT press, 1998.
[28]
F. Sung, Y. Yang, L. Zhang, T. Xiang, P. Torr, and T. Hospedales, Learning to compare: Relation network for few-shot learning, in Proc. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ), Salt Lake City, UT, USA, 2018, pp. 1199–1208.
[29]

F. Corbato, On building systems that will fail, Commun. ACM, vol. 34, no. 9, pp. 72–81, 1991.

[30]

F. Cherni, M. Boujelben, L. Jaiem, Y. Boutereaa, C. Rekik, and N. Derbel, Autonomous mobile robot navigation based on an integrated environment representation designed in dynamic environments, Int. J. Autom. Control, vol. 11, no. 1, pp. 35–53, 2017.

[31]

C. Hu, R. Qiao, Z. Zhang, X. Yan, and M. Li, Dynamic scheduling algorithm based on evolutionary reinforcement learning for sudden contaminant events under uncertain environment, Complex Systems Modeling and Simulation, vol. 2, no. 3, pp. 213–223, 2022.

[32]

J. Xin, Y. Qu, F. Zhang, and R. Negenborn, Distributed model predictive contouring control for real-time multi-robot motion planning, Complex System Modeling and Simulation, vol. 2, no. 4, pp. 273–287, 2022.

[33]

S. Ishii, W. Yoshida, and J. Yoshimoto, Control of exploitation-exploration meta-parameter in reinforcement learning, Neural Netw., vol. 15, nos. 4–6, pp. 665–687, 2002.

[34]

K. Zhang and L. Ning, Hybrid navigation method for multiple robots facing dynamic obstacles, Tsinghua Science and Technology, vol. 27, no. 6, pp. 894–901, 2022.

[35]
S. Sun and B. Xu. Online map fusion system based on sparse point-cloud, Int. J. Autom. Control, vol. 15, nos. 4&5, pp. 585–610, 2021.
[36]
Z. Li, B. Xu, D. Wu, K. Zhao, S. Chen, M. Lu, and J. Cong, A YOLO-GGCNN based grasping framework for mobile robots in unknown Environments, Expert Syst. Appl., vol. 225, 2023.
[37]

H. Lu, S. Yang, M. Zhao, and S. Cheng, Multi-robot indoor environment map building based on multi-stage optimization method, Complex System Modeling and Simulation, vol. 1, no. 2, pp. 145–161, 2021.

Tsinghua Science and Technology
Pages 1481-1494
Cite this article:
Zhou X, Xu B, Jiang Z, et al. Reset-Free Reinforcement Learning via Multi-State Recovery and Failure Prevention for Autonomous Robots. Tsinghua Science and Technology, 2024, 29(5): 1481-1494. https://doi.org/10.26599/TST.2023.9010117

584

Views

45

Downloads

0

Crossref

0

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 13 August 2023
Revised: 02 October 2023
Accepted: 10 October 2023
Published: 02 May 2024
© The Author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return