[1]
Sutton R S, Barto A G. Reinforcement Learning: An Introduction. MIT Press, 1998.
[4]
Mnih V, Badia A P, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K. Asynchronous methods for deep reinforcement learning. In Proc. the 33rd International Conference on Machine Learning, June 2016, pp.1928-1937.
[5]
Schaul T, Quan J, Antonoglou I, Silver D. Prioritized experience replay. In Proc. the 4th International Conference on Learning Representations, May 2016.
[6]
van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-learning. In Proc. the 30th AAAI Conference on Artificial Intelligence, February 2016, pp.2094-2100.
[7]
Wang Z, Schaul T, Hessel M, van Hasselt H, Lanctot M, de Freitas N. Dueling network architectures for deep reinforcement learning. In Proc. the 33rd International Conference on Learning Representations, June 2016, pp.1995-2003.
[8]
Bloembergen D, Kaisers M, Tuyls K. Empirical and theoretical support for lenient learning. In Proc. the 10th International Conference on Autonomous Agents and Multiagent Systems, May 2011, pp.1105-1106.
[9]
Matignon L, Laurent G J, le Fort-Piat N. Hysteretic Q-learning: An algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In Proc. the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2007, pp.64-69.
[11]
Panait L, Sullivan K, Luke S. Lenient learners in cooperative multiagent systems. In Proc. the 5th International Conference on Autonomous Agents and Multiagent Systems, May 2006, pp.801-803.
[13]
Yang T, Hao J, Meng Z, Zheng Y, Zhang C, Zheng Z. Bayes-ToMoP: A fast detection and best response algorithm towards sophisticated opponents. In Proc. the 18th International Conference on Autonomous Agents and Multiagent Systems, May 2019, pp.2282-2284.
[14]
Yang T, Hao J, Meng Z, Zhang C, Zheng Y, Zheng Z. Towards efficient detection and optimal response against sophisticated opponents. In Proc. the 28th International Joint Conference on Artificial Intelligence, August 2019, pp.623-629.
[15]
Zheng Y, Meng Z P, Hao J Y, Zhang Z Z, Yang T P, Fan C J. A deep Bayesian policy reuse approach against nonstationary agents. In Proc. the 2018 Annual Conference on Neural Information Processing Systems, December 2018, pp.962-972.
[16]
Gupta J K, Egorov M, Kochenderfer M. Cooperative multiagent control using deep reinforcement learning. In Proc. the 2017 International Conference on Autonomous Agents and Multiagent Systems Workshops, May 2017, pp.66-83.
[17]
Lanctot M, Zambaldi V, Gruslys A et al. A unified gametheoretic approach to multiagent reinforcement learning. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.4190-4203.
[18]
Claus C, Boutilier C. The dynamics of reinforcement learning in cooperative multiagent systems. In Proc. the 15th AAAI Conference on Artificial Intelligence, July 1998, pp.746-752.
[19]
Zhang Z, Pan Z, Kochenderfer M J. Weighted double Q-learning. In Proc. the 26th International Joint Conference on Artificial Intelligence, August 2017, pp.3455-3461.
[20]
Zheng Y, Meng Z, Hao J, Zhang Z. Weighted double deep multiagent reinforcement learning in stochastic cooperative environments. In Proc. the 15th Pacific Rim International Conference on Artificial Intelligence, August 2018, pp.421-429.
[21]
Watkins C. Learning from delayed rewards [Ph.D. Thesis]. King’s College, University of Cambridge, 1989.
[24]
van Hasselt H. Double Q-learning. In Proc. the 24th Annual Conference on Neural Information Processing Systems, December 2010, pp.2613-2621.
[25]
Potter M A, de Jong K A. A cooperative convolutionary approach to function optimization. In Proc. the 3rd International Conference on Parallel Problem Solving from Nature, October 1994, pp.249-257.
[26]
Tang H, Houthooft R, Foote D, Stooke A, Chen O X, Duan Y, Schulman J, de Turck F, Abbeel P. #Exploration: A study of count-based exploration for deep reinforcement learning. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.2753-2762.
[27]
Benda M, Jagannathan V, Dodhiawala R. On optimal cooperation of knowledge sources — An empirical investigation. Technical Report, Boeing Advanced Technology Center, Boeing Computing Services, 1986.
[28]
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proc. the 2017 Annual Conference on Neural Information Processing Systems, December 2017, pp.6379-6390.
[29]
Palmer G, Tuyls K, Bloembergen D, Savani R. Lenient multi-agent deep reinforcement learning. In Proc. the 17th International Conference on Autonomous Agents and Multigent Systems, July 2018, pp.443-451.
[30]
Buşoniu L, Babuška R, de Schutter B. Multi-agent reinforcement learning: An overview. In Innovations in Multiagent Systems and Applications-1, Srinivasan P, Jain L C (eds.), 2010, pp.183-221.
[31]
Chou P, Maturana D, Scherer S. Improving stochastic policy gradients in continuous control with deep reinforcement learning using the beta distribution. In Proc. the 34th International Conference on Machine Learning, August 2017, pp.834-843.