Journal Home > Volume 28 , Issue 1

Reinforcement learning (RL), one of three branches of machine learning, aims for autonomous learning and is now greatly driving the artificial intelligence development, especially in autonomous distributed systems, such as cooperative Boston Dynamics robots. However, robust RL has been a challenging problem of reliable aspects due to the gap between laboratory simulation and real world. Existing efforts have been made to approach this problem, such as performing random environmental perturbations in the learning process. However, one cannot guarantee to train with a positive perturbation as bad ones might bring failures to RL. In this work, we treat robust RL as a multi-task RL problem, and propose a curricular robust RL approach. We first present a generative adversarial network (GAN) based task generation model to iteratively output new tasks at the appropriate level of difficulty for the current policy. Furthermore, with these progressive tasks, we can realize curricular learning and finally obtain a robust policy. Extensive experiments in multiple environments demonstrate that our method improves the training stability and is robust to differences in training/test conditions.


menu
Abstract
Full text
Outline
About this article

Curricular Robust Reinforcement Learning via GAN-Based Perturbation Through Continuously Scheduled Task Sequence

Show Author's information Yike Li1Yunzhe Tian1Endong Tong1( )Wenjia Niu1( )Yingxiao Xiang1Tong Chen1Yalun Wu1Jiqiang Liu1
Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing Jiaotong University, Beijing 100044, China

Abstract

Reinforcement learning (RL), one of three branches of machine learning, aims for autonomous learning and is now greatly driving the artificial intelligence development, especially in autonomous distributed systems, such as cooperative Boston Dynamics robots. However, robust RL has been a challenging problem of reliable aspects due to the gap between laboratory simulation and real world. Existing efforts have been made to approach this problem, such as performing random environmental perturbations in the learning process. However, one cannot guarantee to train with a positive perturbation as bad ones might bring failures to RL. In this work, we treat robust RL as a multi-task RL problem, and propose a curricular robust RL approach. We first present a generative adversarial network (GAN) based task generation model to iteratively output new tasks at the appropriate level of difficulty for the current policy. Furthermore, with these progressive tasks, we can realize curricular learning and finally obtain a robust policy. Extensive experiments in multiple environments demonstrate that our method improves the training stability and is robust to differences in training/test conditions.

Keywords: robust reinforcement learning, generative adversarial network (GAN) based model, curricular learning

References(36)

[1]
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, Domain randomization for transferring deep neural networks from simulation to the real world, in Proc. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, 2017, pp. 23–30.
[2]
C. G. Atkeson and J. Morimoto, Nonparametric representation of policies and value functions: A trajectory-based approach, in Proc. Advances in Neural Information Processing Systems 15 (NIPS), Vancouver, Canada, 2002, pp. 1611–1618.
[3]
J. Morimoto and K. Doya, Robust reinforcement learning, Neural Computation, vol. 17, no. 2, pp. 335–359, 2005.
[4]
A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine, EPOpt: Learning robust neural network policies using model ensembles, presented at 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, 2017.
[5]
L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, Robust adversarial reinforcement learning, in Proc. 34th International Conference on Machine Learning (ICML), Sydney, Australia, 2017, pp. 2817–2826.
[6]
C. Tessler, Y. Efroni, and S. Mannor, Action robust reinforcement learning and applications in continuous control, in Proc. 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 2019, pp. 6215–6224.
[7]
E. Vinitsky, Y. Du, K. Parvate, K. Jang, P. Abbeel, and A. Bayen, Robust reinforcement learning using adversarial populations, arXiv preprint arXiv: 2008.01825, 2020.
[8]
M. A. Abdullah, H. Ren, H. B. Ammar, V. Milenkovic, R. Luo, M. Zhang, J. Wang, Wasserstein robust reinforcement learning, arXiv preprint arXiv: 1907.13196, 2019.
[9]
P. Kamalaruban, Y. T. Huang, Y. P. Hsieh, P. Rolland, C. Shi, and V. Cevher, Robust reinforcement learning via adversarial training with langevin dynamics, presented at 34th Advances in Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.
[10]
C. Li, C. Chen, D. E. Carlson, and L. Carin, Preconditioned stochastic gradient langevin dynamics for deep neural networks, in Proc. Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 2016, pp. 1788–1794.
[11]
J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, Trust region policy optimization, in Proc. 32nd International Conference on Machine Learning, Lille, France, 2015, pp. 1889–1897.
[12]
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, Openai gym, arXiv preprint arXiv: 1606.01540, 2016.
[13]
C. Florensa, D. Held, X. Geng, and P. Abbeel, Automatic goal generation for reinforcement learning agents, in Proc. 35th International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 1514–1528.
[14]
X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley, On the effectiveness of least squares generative adversarial networks, IEEE Transactions on PAMI, vol. 41, no. 12, pp. 2947–2960, 2019.
[15]
W. Wiesemann, D. Kuhn, and B. Rustem, Robust markov decision processes, Mathematics of Operations Research, vol. 38, no. 1, pp. 153–183, 2013.
[16]
E. Todorov, T. Erez, and Y. Tassa, MuJoCo: A physics engine for model-based control, in Proc. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 2012, pp. 5026–5033.
[17]
W. Zhong, N. Yu, and C. Ai, Applying big data based deep learning system to intrusion detection, Big Data Mining and Analytics, vol. 3, no. 3, pp. 181–195, 2020.
[18]
A. Guezzaz, Y. Asimi, M. Azrour, and A. Asimi, Mathematical validation of proposed machine learning classifier for heterogeneous traffic and anomaly detection, Big Data Mining and Analytics, vol. 4, no. 1, pp. 18–24, 2021.
[19]
X. Pan, D. Seita, Y. Gao, and J. Canny, Risk averse robust adversarial reinforcement learning, in Proc. 2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada, 2019, pp. 8522–8528.
[20]
A. Pattanaik, Z. Tang, S. Liu, G. Bommannan, and G. Chowdhary, Robust deep reinforcement learning with adversarial attacks, arXiv preprint arXiv: 1712.03632, 2017.
[21]
R. Cheng, A. Verma, G. Orosz, S. Chaudhuri, Y. Yue, and J. Burdick, Control regularization for reduced variance reinforcement learning, in Proc. 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, pp. 1141–1150.
[22]
Z. Q. Zhou, Q. Bai, Z. Y. Zhou, L. Qiu, J. Blanchet, and P. Glynn, Finite-sample regret bound for distributionally robust offline tabular reinforcement learning, in Proc. 24th International Conference on Artificial Intelligence and Statistics, San Diego, CA, USA, 2021, pp. 3331–3339.
[23]
N. Jakobi, P. Husbands, and I. Harvey, Noise and the reality gap: The use of simulation in evolutionary robotics, in Proc. Third European Conference on Artificial Life, Granada, Spain, 1995, pp. 704–720.
[24]
I. Harvey, Artificial evolution and real robots, Artificial Life and Robotics, vol. 1, no. 1, pp. 35–38, 1997.
[25]
A. A. A. Sallab, M. Abdou, E. Perot, and S. Yogamani, Deep reinforcement learning framework for autonomous driving, Electronic Imaging, vol. 2017, no. 19, pp. 70–76, 2017.
[26]
H. Zhang, H. Chen, L. Xiao, B. Li, D. S. Boning, and C. J. Hsieh, Robust deep reinforcement learning against adversarial perturbations on observations, arXiv preprint arXiv: 2003. 08938, 2020.
[27]
C. Tessler, Y. Efroni, and S. Mannor, Action robust reinforcement learning and applications in continuous control, in Proc. 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, pp. 6215–6224.
[28]
M. Laumanns and J. Ocenasek, Bayesian optimization algorithms for multi-objective optimization, in Proc. 7th International Conference on Parallel Problem Solving from Nature, Granada, Spain, 2002, pp. 298–307.
[29]
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Proc. 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.
[30]
H. Ye, G. Deng, and J. C. Devlin, Least squares approach for lossless image coding, in Proc. Fifth International Symposium on Signal Processing and its Applications, Brisbane, Australia, 1999, pp. 63–66.
[31]
P. T. D. Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, A tutorial on the cross-entropy method, Ann. Oper. Res., vol. 134, no. 1, pp. 19–67, 2005.
[32]
T. Matiisen, A. Oliver, T. Cohen, and J. Schulman, Teacher–student curriculum learning, IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, pp. 3732–3740, 2019.
[33]
W. Czarnecki, S. Jayakumar, M. Jaderberg, L. Hasenclever, Y. W. Teh, N. Heess, S. Osindero, and R. Pascanu, Mix & match agent curricula for reinforcement learning, in Proc. 35th International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 1087–1095.
[34]
Y. Bengio, J. Louradour, R. Collobert, and J. Weston, Curriculum learning, in Proc. 26th Annual International Conference on Machine Learning, Montreal, Canada, 2009, pp. 41–48.
[35]
S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, Scheduled sampling for sequence prediction with recurrent neural networks, in Proc. 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015, pp. 1171–1179.
[36]
A. Karpathy and M. V. D. Panne, Curriculum learning for motor skills, in Proc. 25th Canadian Conference on Advances in Artificial Intelligence, Toronto, Canada, 2012, pp. 325–330.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 30 September 2021
Accepted: 13 October 2021
Published: 21 July 2022
Issue date: February 2023

Copyright

© The author(s) 2023.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61972025, 61802389, 61672092, U1811264, and 61966009), and the National Key R&D Program of China (Nos. 2020YFB1005604 and 2020YFB2103802).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return