References(36)
[1]
J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, Domain randomization for transferring deep neural networks from simulation to the real world, in Proc. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, Canada, 2017, pp. 23–30.
[2]
C. G. Atkeson and J. Morimoto, Nonparametric representation of policies and value functions: A trajectory-based approach, in Proc. Advances in Neural Information Processing Systems 15 (NIPS), Vancouver, Canada, 2002, pp. 1611–1618.
[3]
J. Morimoto and K. Doya, Robust reinforcement learning, Neural Computation, vol. 17, no. 2, pp. 335–359, 2005.
[4]
A. Rajeswaran, S. Ghotra, B. Ravindran, and S. Levine, EPOpt: Learning robust neural network policies using model ensembles, presented at 5th International Conference on Learning Representations (ICLR 2017), Toulon, France, 2017.
[5]
L. Pinto, J. Davidson, R. Sukthankar, and A. Gupta, Robust adversarial reinforcement learning, in Proc. 34th International Conference on Machine Learning (ICML), Sydney, Australia, 2017, pp. 2817–2826.
[6]
C. Tessler, Y. Efroni, and S. Mannor, Action robust reinforcement learning and applications in continuous control, in Proc. 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA, 2019, pp. 6215–6224.
[7]
E. Vinitsky, Y. Du, K. Parvate, K. Jang, P. Abbeel, and A. Bayen, Robust reinforcement learning using adversarial populations, arXiv preprint arXiv: 2008.01825, 2020.
[8]
M. A. Abdullah, H. Ren, H. B. Ammar, V. Milenkovic, R. Luo, M. Zhang, J. Wang, Wasserstein robust reinforcement learning, arXiv preprint arXiv: 1907.13196, 2019.
[9]
P. Kamalaruban, Y. T. Huang, Y. P. Hsieh, P. Rolland, C. Shi, and V. Cevher, Robust reinforcement learning via adversarial training with langevin dynamics, presented at 34th Advances in Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada, 2020.
[10]
C. Li, C. Chen, D. E. Carlson, and L. Carin, Preconditioned stochastic gradient langevin dynamics for deep neural networks, in Proc. Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 2016, pp. 1788–1794.
[11]
J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, Trust region policy optimization, in Proc. 32nd International Conference on Machine Learning, Lille, France, 2015, pp. 1889–1897.
[12]
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, Openai gym, arXiv preprint arXiv: 1606.01540, 2016.
[13]
C. Florensa, D. Held, X. Geng, and P. Abbeel, Automatic goal generation for reinforcement learning agents, in Proc. 35th International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 1514–1528.
[14]
X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, and S. P. Smolley, On the effectiveness of least squares generative adversarial networks, IEEE Transactions on PAMI, vol. 41, no. 12, pp. 2947–2960, 2019.
[15]
W. Wiesemann, D. Kuhn, and B. Rustem, Robust markov decision processes, Mathematics of Operations Research, vol. 38, no. 1, pp. 153–183, 2013.
[16]
E. Todorov, T. Erez, and Y. Tassa, MuJoCo: A physics engine for model-based control, in Proc. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 2012, pp. 5026–5033.
[17]
W. Zhong, N. Yu, and C. Ai, Applying big data based deep learning system to intrusion detection, Big Data Mining and Analytics, vol. 3, no. 3, pp. 181–195, 2020.
[18]
A. Guezzaz, Y. Asimi, M. Azrour, and A. Asimi, Mathematical validation of proposed machine learning classifier for heterogeneous traffic and anomaly detection, Big Data Mining and Analytics, vol. 4, no. 1, pp. 18–24, 2021.
[19]
X. Pan, D. Seita, Y. Gao, and J. Canny, Risk averse robust adversarial reinforcement learning, in Proc. 2019 International Conference on Robotics and Automation (ICRA), Montreal, Canada, 2019, pp. 8522–8528.
[20]
A. Pattanaik, Z. Tang, S. Liu, G. Bommannan, and G. Chowdhary, Robust deep reinforcement learning with adversarial attacks, arXiv preprint arXiv: 1712.03632, 2017.
[21]
R. Cheng, A. Verma, G. Orosz, S. Chaudhuri, Y. Yue, and J. Burdick, Control regularization for reduced variance reinforcement learning, in Proc. 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, pp. 1141–1150.
[22]
Z. Q. Zhou, Q. Bai, Z. Y. Zhou, L. Qiu, J. Blanchet, and P. Glynn, Finite-sample regret bound for distributionally robust offline tabular reinforcement learning, in Proc. 24th International Conference on Artificial Intelligence and Statistics, San Diego, CA, USA, 2021, pp. 3331–3339.
[23]
N. Jakobi, P. Husbands, and I. Harvey, Noise and the reality gap: The use of simulation in evolutionary robotics, in Proc. Third European Conference on Artificial Life, Granada, Spain, 1995, pp. 704–720.
[24]
I. Harvey, Artificial evolution and real robots, Artificial Life and Robotics, vol. 1, no. 1, pp. 35–38, 1997.
[25]
A. A. A. Sallab, M. Abdou, E. Perot, and S. Yogamani, Deep reinforcement learning framework for autonomous driving, Electronic Imaging, vol. 2017, no. 19, pp. 70–76, 2017.
[26]
H. Zhang, H. Chen, L. Xiao, B. Li, D. S. Boning, and C. J. Hsieh, Robust deep reinforcement learning against adversarial perturbations on observations, arXiv preprint arXiv: 2003. 08938, 2020.
[27]
C. Tessler, Y. Efroni, and S. Mannor, Action robust reinforcement learning and applications in continuous control, in Proc. 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, pp. 6215–6224.
[28]
M. Laumanns and J. Ocenasek, Bayesian optimization algorithms for multi-objective optimization, in Proc. 7th International Conference on Parallel Problem Solving from Nature, Granada, Spain, 2002, pp. 298–307.
[29]
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets, in Proc. 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.
[30]
H. Ye, G. Deng, and J. C. Devlin, Least squares approach for lossless image coding, in Proc. Fifth International Symposium on Signal Processing and its Applications, Brisbane, Australia, 1999, pp. 63–66.
[31]
P. T. D. Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, A tutorial on the cross-entropy method, Ann. Oper. Res., vol. 134, no. 1, pp. 19–67, 2005.
[32]
T. Matiisen, A. Oliver, T. Cohen, and J. Schulman, Teacher–student curriculum learning, IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, pp. 3732–3740, 2019.
[33]
W. Czarnecki, S. Jayakumar, M. Jaderberg, L. Hasenclever, Y. W. Teh, N. Heess, S. Osindero, and R. Pascanu, Mix & match agent curricula for reinforcement learning, in Proc. 35th International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 1087–1095.
[34]
Y. Bengio, J. Louradour, R. Collobert, and J. Weston, Curriculum learning, in Proc. 26th Annual International Conference on Machine Learning, Montreal, Canada, 2009, pp. 41–48.
[35]
S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, Scheduled sampling for sequence prediction with recurrent neural networks, in Proc. 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015, pp. 1171–1179.
[36]
A. Karpathy and M. V. D. Panne, Curriculum learning for motor skills, in Proc. 25th Canadian Conference on Advances in Artificial Intelligence, Toronto, Canada, 2012, pp. 325–330.