[1]
E. Biglieri, A. J. Goldsmith, L. J. Greenstein, N. B. Mandayam, and H. V. Poor, Principles of Cognitive Radio. New York, NY, USA: Cambridge University Press, 2013.
[4]
N. Yang, H. Zhang, and R. Berry, Partially observable multi-agent deep reinforcement learning for cognitive resource management, in Proc. GLOBECOM 2020—2020 IEEE Global Communications Conference, Taiwan, Province of China, 2020, pp. 1–6.
[5]
J. Mitola, Cognitive radio: An integrated agent architecture for software defined radio, PhD dissertation, Teleinfomatics, Royal Institute of Technology (KTH), Stockholm, Sweden, 2000.
[7]
S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms. New York, NY: Cambridge University Press, 2014.
[8]
Y. Lu and K. Yan, Algorithms in multi-agent systems: A holistic perspective from reinforcement learning and game theory, arXiv preprint arXiv: 2001.06487, 2020.
[9]
L. Buşoniu, R. Babuška, and B. D. Schutter, Multi-agent reinforcement learning: An overview, in Innovations in Multi-Agent Systems and Applications – 1, D. Srinivasan and L. C. Jain, eds. Berlin, Germany: Springer, 2010, pp. 183–221.
[10]
N. R. Ravishankar and M. V. Vijayakumar, Reinforcement learning algorithms: Survey and classification, Indian Journal of Science and Technology, doi: 10.17485/ijst/2017/v10i1/109385.
[11]
E. Tampubolon, H. Ceribasic, and H. Boche, On information asymmetry in competitive multi-agent reinforcement learning: Convergence and optimality, arXiv preprint arXiv: 2010.10901, 2020.
[14]
M. Gerczuk, Multi-agent reinforcement learning-from game theory to organic computing, https://vixra.org/pdf/1903.0006v1.pdf, 2019.
[15]
H. Zhang and T. Yu, Taxonomy of reinforcement learning algorithms, in Deep Reinforcement Learning, H. Dong, Z. Ding, and S. Zhang, eds. Singapore: Springer, 2020, pp. 125–133.
[16]
K. Zhang, Z. Yang, and T. Başar, Multi-agent reinforcement learning: A selective overview of theories and algorithms, arXiv preprint arXiv: 1911.10635, 2019.
[18]
Y. Yang, Many-agent reinforcement learning, PhD dissertation, Department of Computer Science, University College London (UCL), London, UK, 2021.
[19]
A. Alwarafy, M. Abdallah, B. S. Ciftler, A. Al-Fuqaha, and M. Hamdi, Deep reinforcement learning for radio resource allocation and management in next generation heterogeneous wireless networks: A survey, arXiv preprint arXiv: 2106.00574, 2021.
[21]
A. Harvey, K. B. Laskey, and K. -C. Chang, Machine learning applications for sensor tasking with non-linear filtering, Sensors, vol. 22, no. 6, p. 2229, 2022.
[22]
G. M. Skaltsis, H. -S. Shin, and A. Tsourdos, A survey of task allocation techniques in MAS, in Proc. 2021 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece, 2021, pp. 488–497.
[23]
M. L. Littman, Markov games as a framework for multi-agent reinforcement learning, in Proc. Eleventh International Conference, New Brunswick, NJ, USA, 1994, pp. 157–163.
[24]
A. Nowé, P. Vrancx, and Y. -M. D. Hauwere, Game theory and multi-agent reinforcement learning, in Reinforcement Learning, M. Wiering and M. Otterlo, eds. Berlin, Germany: Springer, 2012, pp. 441–470.
[25]
Y. Yang and J. Wang, An overview of multi-agent reinforcement learning from game theoretical perspective, arXiv preprint arXiv: 2011.00583, 2020.
[30]
M. Bowling and M. Veloso, An analysis of stochastic game theory for multiagent reinforcement learning, https://www.cs.cmu.edu/~mmv/papers/00TR-mike.pdf, 2000.
[31]
M. Kearns, M. L. Littman, and S. Singh, Graphical models for game theory, arXiv preprint arXiv: 1301.2281, 2013.
[32]
S. Kapoor, Multi-agent reinforcement learning: A report on challenges and approaches, arXiv preprint arXiv: 1807.09427, 2018.
[37]
A. Greenwald, J. Li, and E. Sodomka, Solving for best responses and equilibria in extensive-form games with reinforcement learning methods, in Rohit Parikh on Logic, Language and Society, C. Başkent, L. S. Moss, and R. Ramanujam, eds. Cham, Switzerland: Springer, 2017, pp. 185–226.
[38]
A. Akramizadeh, M. -B. Menhaj, and A. Afshar, Multiagent reinforcement learning in extensive form games with complete information, in Proc. 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Nashville, TN, USA, 2009, pp. 205–211.
[39]
M. Lanctot, E. Lockhart, J. -B. Lespiau, V. Zambaldi, S. Upadhyay, J. Pérolat, S. Srinivasan, F. Timbers, K. Tuyls, S. Omidshafiei, et al., OpenSpiel: A framework for reinforcement learning in games, arXiv preprint arXiv: 1908.09453, 2019.
[40]
C. K. Ling, F. Fang, and J. Z. Kolter, What game are we playing? End-to-end learning in normal and extensive form games, arXiv preprint arXiv: 1805.02777, 2018.
[42]
J. Heinrich, M. Lanctot, and D. Silver, Fictitious self-play in extensive-form games, in Proc. 32nd International Conference on Machine Learning, Lille, France, 2015, pp. 805–813.
[43]
M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver, and T. Graepel, A unified game-theoretic approach to multiagent reinforcement learning, arXiv preprint arXiv: 1711.00832, 2017.
[44]
Y. Wen, H. Chen, Y. Yang, Z. Tian, M. Li, X. Chen, and J. Wang, A game-theoretic approach to multi-agent trust region optimization, arXiv preprint arXiv: 2106.06828, 2021.
[45]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing Atari with deep reinforcement learning, arXiv preprint arXiv: 1312.5602, 2013.
[50]
P. J. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting. New York, NY, USA: John Wiley & Sons, 1994.
[52]
F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune, Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning, arXiv preprint arXiv: 1712.06567, 2017.
[53]
S. Iqbal and F. Sha, Actor-attention-critic for multi-agent reinforcement learning, in Proc. 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, pp. 2961–2970.
[54]
X. Ma, Y. Yang, C. Li, Y. Lu, Q. Zhao, and Y. Jun, Modeling the interaction between agents in cooperative multi-agent reinforcement learning, arXiv preprint arXiv: 2102.06042, 2021.
[55]
W. Li, B. Jin, X. Wang, J. Yan, and H. Zha, F2A2: Flexible fully-decentralized approximate actor-critic for cooperative multi-agent reinforcement learning, arXiv preprint arXiv: 2004.11145, 2020.
[56]
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, and W. Zaremba, Hindsight experience replay, presented at 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017.
[57]
S. Fujimoto, H. Hoof, and D. Meger, Addressing function approximation error in actor-critic methods, in Proc. 35th International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 1587–1596.
[58]
J. Heinrich and D. Silver, Deep reinforcement learning from self-play in imperfect-information games, arXiv preprint arXiv: 1603.01121, 2016.
[59]
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv: 1509.02971, 2015.
[61]
A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning, in Proc. 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 2018, pp. 7559–7566.
[62]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv: 1707.06347, 2017.
[63]
T. Weber, S. Racanière, D. P. Reichert, L. Buesing, A. Guez, D. J. Rezende, A. P. Badia, O. Vinyals, N. Heess, Y. Li, et al., Imagination-augmented agents for deep reinforcement learning, arXiv preprint arXiv: 1707.06203, 2017.
[64]
R. H. Puspita, S. D. A. Shah, G. -M. Lee, B. -H. Roh, J. Oh, and S. Kang, Reinforcement learning based 5G enabled cognitive radio networks, in Proc. 2019 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 2019, pp. 555–558.
[65]
J. Yang, X. Ye, R. Trivedi, H. Xu, and H. Zha, Deep mean field games for learning optimal behavior policy of large populations, arXiv preprint arXiv: 1711.03156, 2018.
[66]
Y. Yang, R. Luo, M. Li, M. Zhou, W. Zhang, and J. Wang, Mean field multi-agent reinforcement learning, in Proc. 35th International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 5571–5580.
[67]
J. Subramanian and A. Mahajan, Reinforcement learning in stationary mean-field games, in Proc. 18th International Conference on Autonomous Agents and MultiAgent Systems, Montreal, Canada, 2019, pp. 251–259.
[68]
M. Agarwal, V. Aggarwal, A. Ghosh, and N. Tiwari, Reinforcement learning for mean field game, arXiv preprint arXiv: 1905.13357, 2019.
[69]
A. Angiuli, J. -P. Fouque, and M. Laurière, Unified reinforcement Q-learning for mean field game and control problems, arXiv preprint arXiv: 2006.13912, 2020.
[71]
M. Li, Z. Qin, Y. Jiao, Y. Yang, J. Wang, C. Wang, G. Wu, and J. Ye, Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning, in Proc. WWW '19: The World Wide Web Conference, San Francisco, CA, USA, 2019, pp. 983–994.
[72]
S. G. Subramanian, P. Poupart, M. E. Taylor, and N. Hegde, Multi type mean field reinforcement learning, arXiv preprint arXiv: 2002.02513, 2020.
[73]
S. Sudhakara, A. Mahajan, A. Nayyar, and Y. Ouyang, Scalable regret for learning to control network-coupled subsystems with unknown dynamics, arXiv preprint arXiv: 2108.07970, 2021.
[74]
M. Muhlhauser, Ubiquitous computing and its influence on MSE [multimedia software engineering], in Proc. International Symposium on Multimedia Software Engineering, Taiwan, Province of China, 2002, pp. 48–55.
[75]
J. Branke, M. Mnif, C. Müller-Schloer, H. Prothmann, U. Richter, F. Rochner, and H. Schmeck, Organic computing—Addressing complexity by controlled self-organization, in Proc. Second International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (isola 2006), Paphos, Cyprus, 2006, pp. 185–191.
[76]
J. Branke, M. Mnif, C. Müller-Schloer, H. Prothmann, U. Richter, F. Rochner, and H. Schmeck, Organic computing—Addressing complexity by controlled self-organization, in Proc. Second International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, Paphos, Cyprus, 2006, pp. 185–191.
[77]
H. A. Simon, Bounded rationality, in Utility and probability, J. Eatwell, M. Milgate, and P. Newman, eds. London, UK: Palgrave Macmillan, 1990, pp. 15–18.
[78]
S. Tomforde and B. Sick, eds, Organic Computing: Doctoral Dissertation Colloquium 2018. Kassel, Germany: Kassel University Press, 2019.
[79]
S. Rudolph, S. Tomforde, B. Sick, H. Heck, A. Wacker, and J. Hähner, An online influence detection algorithm for organic computing systems, in Proc. ARCS 2015—28th International Conference on Architecture of Computing Systems, Porto, Portugal, 2015, pp. 1–8.
[80]
S. Reichhuber and S. Tomlorde, Opportunistic meta-learning: A case study for quality assurance in industry 4.0 environments, in Proc. 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), Washington, DC, USA, 2020, pp. 76–81.
[81]
C. Wu, K. Chowdhury, M. D. Felice, and W. Meleis, Spectrum management of cognitive radio using multi-agent reinforcement learning, in Proc. 9th Int. Conf. Autonomous Agents and Multiagent Systems: Industry Track, Toronto, Canada, 2010, pp. 1705–1712.
[83]
F. E. Dorner, Measuring progress in deep reinforcement learning sample efficiency, arXiv preprint arXiv: 2102.04881, 2021.
[84]
A. Kuhnle, M. Aroca-Ouellette, A. Basu, M. Sensoy, J. Reid, and D. Zhang, Reinforcement learning for information retrieval, in Proc. 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 2021, pp. 2669–2672.
[85]
K. Kurzer, P. Schörner, A. Albers, H. Thomsen, K. Daaboul, and J. M. Zöllner, Generalizing decision making for automated driving with an invariant environment representation using deep reinforcement learning, arXiv preprint arXiv: 2102.06765, 2021.
[86]
M. Zhou, Z. Wan, H. Wang, M. Wen, R. Wu, Y. Wen, Y. Yang, W. Zhang, and J. Wang, MALib: A parallel framework for population-based multi-agent reinforcement learning, arXiv preprint arXiv: 2106.07551, 2021.
[87]
S. Huang and S. Ontañón, Action guidance: Getting the best of sparse rewards and shaped rewards for real-time strategy games, arXiv preprint arXiv: 2010.03956, 2020.
[88]
M. Riedmiller, R. Hafner, T. Lampe, M. Neunert, J. Degrave, T. Wiele, V. Mnih, N. Heess, and J. T. Springenberg, Learning by playing solving sparse reward tasks from scratch, in Proc. 35th International Conference on Machine Learning, Stockholm Sweden, 2018, pp. 4344–4353.
[89]
G. Dulac-Arnold, D. Mankowitz, and T. Hester, Challenges of real-world reinforcement learning, arXiv preprint arXiv: 1904.12901, 2019.
[91]
M. Beeks, R. Refaei Afshar, Y. Zhang, R. Dijkman, C. van Dorst, and S. de Looijer, Deep reinforcement learning for a multi-objective online order batching problem, Proceedings of the International Conference on Automated Planning and Scheduling, vol. 32, no. 1, pp. 435–443, 2022.