Journal Home > Volume 4 , Issue 1

The adoption of the Fifth Generation (5G) and beyond 5G networks is driving the demand for learning approaches that enable users to co-exist harmoniously in a multi-user distributed environment. Although resource-constrained, the Cognitive Radio (CR) has been identified as a key enabler of distributed 5G and beyond networks due to its cognitive abilities and ability to access idle spectrum opportunistically. Reinforcement learning is well suited to meet the demand for learning in 5G and beyond 5G networks because it does not require the learning agent to have prior information about the environment in which it operates. Intuitively, CRs should be enabled to implement reinforcement learning to efficiently gain opportunistic access to spectrum and co-exist with each other. However, the application of reinforcement learning is straightforward in a single-agent environment and complex and resource intensive in a multi-agent and multi-objective learning environment. In this paper, (1) we present a brief history and overview of reinforcement learning and its limitations; (2) we provide a review of recent multi-agent learning methods proposed and multi-agent learning algorithms applied in Cognitive Radio (CR) networks; and (3) we further present a novel framework for multi-CR reinforcement learning and conclude with a synopsis of future research directions and recommendations.


menu
Abstract
Full text
Outline
About this article

Towards a multi-agent reinforcement learning approach for joint sensing and sharing in cognitive radio networks

Show Author's information Kagiso Rapetswa1Ling Cheng1( )
School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg 0001, South Africa

Abstract

The adoption of the Fifth Generation (5G) and beyond 5G networks is driving the demand for learning approaches that enable users to co-exist harmoniously in a multi-user distributed environment. Although resource-constrained, the Cognitive Radio (CR) has been identified as a key enabler of distributed 5G and beyond networks due to its cognitive abilities and ability to access idle spectrum opportunistically. Reinforcement learning is well suited to meet the demand for learning in 5G and beyond 5G networks because it does not require the learning agent to have prior information about the environment in which it operates. Intuitively, CRs should be enabled to implement reinforcement learning to efficiently gain opportunistic access to spectrum and co-exist with each other. However, the application of reinforcement learning is straightforward in a single-agent environment and complex and resource intensive in a multi-agent and multi-objective learning environment. In this paper, (1) we present a brief history and overview of reinforcement learning and its limitations; (2) we provide a review of recent multi-agent learning methods proposed and multi-agent learning algorithms applied in Cognitive Radio (CR) networks; and (3) we further present a novel framework for multi-CR reinforcement learning and conclude with a synopsis of future research directions and recommendations.

Keywords: cognitive radio, deep reinforcement learning, multi-agent reinforcement learning, mean field reinforcement learning, organic computing

References(92)

[1]
E. Biglieri, A. J. Goldsmith, L. J. Greenstein, N. B. Mandayam, and H. V. Poor, Principles of Cognitive Radio. New York, NY, USA: Cambridge University Press, 2013.
DOI
[2]

S. Haykin, Cognitive radio: Brain-empowered wireless communications, IEEE Journal on Selected Areas in Communications, vol. 23, no. 2, pp. 201–220, 2005.

[3]

K. Rapetswa and L. Cheng, Convergence of mobile broadband and broadcast services: A cognitive radio sensing and sharing perspective, Intelligent and Converged Networks, vol. 1, no. 1, pp. 99–114, 2020.

[4]
N. Yang, H. Zhang, and R. Berry, Partially observable multi-agent deep reinforcement learning for cognitive resource management, in Proc. GLOBECOM 2020—2020 IEEE Global Communications Conference, Taiwan, Province of China, 2020, pp. 1–6.
DOI
[5]
J. Mitola, Cognitive radio: An integrated agent architecture for software defined radio, PhD dissertation, Teleinfomatics, Royal Institute of Technology (KTH), Stockholm, Sweden, 2000.
[6]

C. Clancy, J. Hecker, E. Stuntebeck, and T. O'Shea, Applications of machine learning to cognitive radio networks, IEEE Wireless Communications, vol. 14, no. 4, pp. 47–52, 2007.

[7]
S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms. New York, NY: Cambridge University Press, 2014.
DOI
[8]
Y. Lu and K. Yan, Algorithms in multi-agent systems: A holistic perspective from reinforcement learning and game theory, arXiv preprint arXiv: 2001.06487, 2020.
[9]
L. Buşoniu, R. Babuška, and B. D. Schutter, Multi-agent reinforcement learning: An overview, in Innovations in Multi-Agent Systems and Applications – 1, D. Srinivasan and L. C. Jain, eds. Berlin, Germany: Springer, 2010, pp. 183–221.
DOI
[10]
N. R. Ravishankar and M. V. Vijayakumar, Reinforcement learning algorithms: Survey and classification, Indian Journal of Science and Technology, doi: 10.17485/ijst/2017/v10i1/109385.
DOI
[11]
E. Tampubolon, H. Ceribasic, and H. Boche, On information asymmetry in competitive multi-agent reinforcement learning: Convergence and optimality, arXiv preprint arXiv: 2010.10901, 2020.
[12]

N. Abbas, Y. Nasser, and K. E. Ahmad, Recent advances on artificial intelligence and learning techniques in cognitive radio networks, EURASIP Journal on Wireless Communications and Networking, vol. 2015, p. 174, 2015.

[13]

M. Bkassiny, Y. Li, and S. K. Jayaweera, A survey on machine-learning techniques in cognitive radios, IEEE Communications Surveys &Tutorials, vol. 15, no. 3, pp. 1136–1159, 2012.

[14]
M. Gerczuk, Multi-agent reinforcement learning-from game theory to organic computing, https://vixra.org/pdf/1903.0006v1.pdf, 2019.
[15]
H. Zhang and T. Yu, Taxonomy of reinforcement learning algorithms, in Deep Reinforcement Learning, H. Dong, Z. Ding, and S. Zhang, eds. Singapore: Springer, 2020, pp. 125–133.
DOI
[16]
K. Zhang, Z. Yang, and T. Başar, Multi-agent reinforcement learning: A selective overview of theories and algorithms, arXiv preprint arXiv: 1911.10635, 2019.
[17]

A. Feriani and E. Hossain, Single and multi-agent deep reinforcement learning for AI-enabled wireless networks: A tutorial, IEEE Communications Surveys &Tutorials, vol. 23, no. 2, pp. 1226–1252, 2021.

[18]
Y. Yang, Many-agent reinforcement learning, PhD dissertation, Department of Computer Science, University College London (UCL), London, UK, 2021.
[19]
A. Alwarafy, M. Abdallah, B. S. Ciftler, A. Al-Fuqaha, and M. Hamdi, Deep reinforcement learning for radio resource allocation and management in next generation heterogeneous wireless networks: A survey, arXiv preprint arXiv: 2106.00574, 2021.
DOI
[20]

N. C. Luong, D. T. Hoang, S. Gong, D. Niyato, P. Wang, Y. -C. Liang, and D. I. Kim, Applications of deep reinforcement learning in communications and networking: A survey, IEEE Communications Surveys &Tutorials, vol. 21, no. 4, pp. 3133–3174, 2019.

[21]
A. Harvey, K. B. Laskey, and K. -C. Chang, Machine learning applications for sensor tasking with non-linear filtering, Sensors, vol. 22, no. 6, p. 2229, 2022.
[22]
G. M. Skaltsis, H. -S. Shin, and A. Tsourdos, A survey of task allocation techniques in MAS, in Proc. 2021 International Conference on Unmanned Aircraft Systems (ICUAS), Athens, Greece, 2021, pp. 488–497.
DOI
[23]
M. L. Littman, Markov games as a framework for multi-agent reinforcement learning, in Proc. Eleventh International Conference, New Brunswick, NJ, USA, 1994, pp. 157–163.
DOI
[24]
A. Nowé, P. Vrancx, and Y. -M. D. Hauwere, Game theory and multi-agent reinforcement learning, in Reinforcement Learning, M. Wiering and M. Otterlo, eds. Berlin, Germany: Springer, 2012, pp. 441–470.
DOI
[25]
Y. Yang and J. Wang, An overview of multi-agent reinforcement learning from game theoretical perspective, arXiv preprint arXiv: 2011.00583, 2020.
[26]

J. Chen, Q. Yu, P. Cheng, Y. Sun, Y. Fan, and X. Shen, Game theoretical approach for channel allocation in wireless sensor and actuator networks, IEEE Transactions on Automatic Control, vol. 56, no. 10, pp. 2332–2344, 2011.

[27]

U. Sharma, P. Mittal, and C. K. Nagpal, Implementing game theory in cognitive radio network for channel allocation: An overview, International Journal of Energy,Information and Communications, vol. 6, no. 2, pp. 17–22, 2015.

[28]

H. -Y. Shi, W. -L. Wang, N. -M. Kwok, and S. -Y. Chen, Game theory for wireless sensor networks: A survey, Sensors (Basel), vol. 12, no. 7, pp. 9055–9097, 2012.

[29]

D. Monderer and L. S. Shapley, Potential games, Games and Economic Behavior, vol. 14, no. 1, pp. 124–143, 1996.

[30]
M. Bowling and M. Veloso, An analysis of stochastic game theory for multiagent reinforcement learning, https://www.cs.cmu.edu/~mmv/papers/00TR-mike.pdf, 2000.
[31]
M. Kearns, M. L. Littman, and S. Singh, Graphical models for game theory, arXiv preprint arXiv: 1301.2281, 2013.
[32]
S. Kapoor, Multi-agent reinforcement learning: A report on challenges and approaches, arXiv preprint arXiv: 1807.09427, 2018.
[33]

L. P. Kaelbling, M. L. Littman, and A. W. Moore, Reinforcement learning: A survey, Journal of Artificial Intelligence Research, vol. 4, pp. 237–285, 1996.

[34]

S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, Convergence results for single-step on-policy reinforcement-learning algorithms, Machine Learning, vol. 38, no. 3, pp. 287–308, 2000.

[35]

J. Hu and M. P. Wellman, Nash Q-learning for general-sum stochastic games, Journal of Machine Learning Research, vol. 4, pp. 1039–1069, 2003.

[36]

G. Arslan and S. Yüksel, Decentralized Q-learning for stochastic teams and games, IEEE Transactions on Automatic Control, vol. 62, no. 4, pp. 1545–1558, 2017.

[37]
A. Greenwald, J. Li, and E. Sodomka, Solving for best responses and equilibria in extensive-form games with reinforcement learning methods, in Rohit Parikh on Logic, Language and Society, C. Başkent, L. S. Moss, and R. Ramanujam, eds. Cham, Switzerland: Springer, 2017, pp. 185–226.
DOI
[38]
A. Akramizadeh, M. -B. Menhaj, and A. Afshar, Multiagent reinforcement learning in extensive form games with complete information, in Proc. 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Nashville, TN, USA, 2009, pp. 205–211.
DOI
[39]
M. Lanctot, E. Lockhart, J. -B. Lespiau, V. Zambaldi, S. Upadhyay, J. Pérolat, S. Srinivasan, F. Timbers, K. Tuyls, S. Omidshafiei, et al., OpenSpiel: A framework for reinforcement learning in games, arXiv preprint arXiv: 1908.09453, 2019.
[40]
C. K. Ling, F. Fang, and J. Z. Kolter, What game are we playing? End-to-end learning in normal and extensive form games, arXiv preprint arXiv: 1805.02777, 2018.
DOI
[41]

V. S. Borkar, Reinforcement learning in Markovian evolutionary games, Advances in Complex Systems, vol. 5, no. 1, pp. 55–72, 2002.

[42]
J. Heinrich, M. Lanctot, and D. Silver, Fictitious self-play in extensive-form games, in Proc. 32nd International Conference on Machine Learning, Lille, France, 2015, pp. 805–813.
[43]
M. Lanctot, V. Zambaldi, A. Gruslys, A. Lazaridou, K. Tuyls, J. Perolat, D. Silver, and T. Graepel, A unified game-theoretic approach to multiagent reinforcement learning, arXiv preprint arXiv: 1711.00832, 2017.
[44]
Y. Wen, H. Chen, Y. Yang, Z. Tian, M. Li, X. Chen, and J. Wang, A game-theoretic approach to multi-agent trust region optimization, arXiv preprint arXiv: 2106.06828, 2021.
[45]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing Atari with deep reinforcement learning, arXiv preprint arXiv: 1312.5602, 2013.
[46]

J. N. Tsitsiklis and B. V. Roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 674–690, 1997.

[47]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, vol. 518, no. 7540, pp. 529–533, 2015.

[48]

A. Mosavi, Y. Faghan, P. Ghamisi, P. Duan, S. F. Ardabili, E. Salwana, and S. S. Band, Comprehensive review of deep reinforcement learning methods and applications in economics, Mathematics, vol. 8, no. 10, p. 1640, 2020.

[49]

P. Hernandez-Leal, B. Kartal, and M. E. Taylor, A survey and critique of multiagent deep reinforcement learning, Autonomous Agents and Multi-Agent Systems, vol. 33, no. 6, pp. 750–797, 2019.

[50]
P. J. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting. New York, NY, USA: John Wiley & Sons, 1994.
[51]

H. Robbins and S. Monro, A stochastic approximation method, Annals of Mathematical Statistics, vol. 22, no. 3, pp. 400–407, 1951.

[52]
F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune, Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning, arXiv preprint arXiv: 1712.06567, 2017.
[53]
S. Iqbal and F. Sha, Actor-attention-critic for multi-agent reinforcement learning, in Proc. 36th International Conference on Machine Learning, Long Beach, CA, USA, 2019, pp. 2961–2970.
[54]
X. Ma, Y. Yang, C. Li, Y. Lu, Q. Zhao, and Y. Jun, Modeling the interaction between agents in cooperative multi-agent reinforcement learning, arXiv preprint arXiv: 2102.06042, 2021.
[55]
W. Li, B. Jin, X. Wang, J. Yan, and H. Zha, F2A2: Flexible fully-decentralized approximate actor-critic for cooperative multi-agent reinforcement learning, arXiv preprint arXiv: 2004.11145, 2020.
[56]
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, P. Abbeel, and W. Zaremba, Hindsight experience replay, presented at 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017.
[57]
S. Fujimoto, H. Hoof, and D. Meger, Addressing function approximation error in actor-critic methods, in Proc. 35th International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 1587–1596.
[58]
J. Heinrich and D. Silver, Deep reinforcement learning from self-play in imperfect-information games, arXiv preprint arXiv: 1603.01121, 2016.
[59]
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv: 1509.02971, 2015.
[60]

S. Liu, J. Cao, Y. Wang, W. Chen, and Y. Liu, Self-play reinforcement learning with comprehensive critic in computer games, Neurocomputing, vol. 449, pp. 207–213, 2021.

[61]
A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning, in Proc. 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 2018, pp. 7559–7566.
DOI
[62]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv: 1707.06347, 2017.
[63]
T. Weber, S. Racanière, D. P. Reichert, L. Buesing, A. Guez, D. J. Rezende, A. P. Badia, O. Vinyals, N. Heess, Y. Li, et al., Imagination-augmented agents for deep reinforcement learning, arXiv preprint arXiv: 1707.06203, 2017.
[64]
R. H. Puspita, S. D. A. Shah, G. -M. Lee, B. -H. Roh, J. Oh, and S. Kang, Reinforcement learning based 5G enabled cognitive radio networks, in Proc. 2019 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 2019, pp. 555–558.
DOI
[65]
J. Yang, X. Ye, R. Trivedi, H. Xu, and H. Zha, Deep mean field games for learning optimal behavior policy of large populations, arXiv preprint arXiv: 1711.03156, 2018.
[66]
Y. Yang, R. Luo, M. Li, M. Zhou, W. Zhang, and J. Wang, Mean field multi-agent reinforcement learning, in Proc. 35th International Conference on Machine Learning, Stockholm, Sweden, 2018, pp. 5571–5580.
[67]
J. Subramanian and A. Mahajan, Reinforcement learning in stationary mean-field games, in Proc. 18th International Conference on Autonomous Agents and MultiAgent Systems, Montreal, Canada, 2019, pp. 251–259.
[68]
M. Agarwal, V. Aggarwal, A. Ghosh, and N. Tiwari, Reinforcement learning for mean field game, arXiv preprint arXiv: 1905.13357, 2019.
[69]
A. Angiuli, J. -P. Fouque, and M. Laurière, Unified reinforcement Q-learning for mean field game and control problems, arXiv preprint arXiv: 2006.13912, 2020.
[70]

R. Elie, J. Pérolat, M. Laurière, M. Geist, and O. Pietquin, On the convergence of model free learning in mean field games, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 5, pp. 7143–7150, 2020.

[71]
M. Li, Z. Qin, Y. Jiao, Y. Yang, J. Wang, C. Wang, G. Wu, and J. Ye, Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning, in Proc. WWW '19: The World Wide Web Conference, San Francisco, CA, USA, 2019, pp. 983–994.
DOI
[72]
S. G. Subramanian, P. Poupart, M. E. Taylor, and N. Hegde, Multi type mean field reinforcement learning, arXiv preprint arXiv: 2002.02513, 2020.
[73]
S. Sudhakara, A. Mahajan, A. Nayyar, and Y. Ouyang, Scalable regret for learning to control network-coupled subsystems with unknown dynamics, arXiv preprint arXiv: 2108.07970, 2021.
[74]
M. Muhlhauser, Ubiquitous computing and its influence on MSE [multimedia software engineering], in Proc. International Symposium on Multimedia Software Engineering, Taiwan, Province of China, 2002, pp. 48–55.
[75]
J. Branke, M. Mnif, C. Müller-Schloer, H. Prothmann, U. Richter, F. Rochner, and H. Schmeck, Organic computing—Addressing complexity by controlled self-organization, in Proc. Second International Symposium on Leveraging Applications of Formal Methods, Verification and Validation (isola 2006), Paphos, Cyprus, 2006, pp. 185–191.
DOI
[76]
J. Branke, M. Mnif, C. Müller-Schloer, H. Prothmann, U. Richter, F. Rochner, and H. Schmeck, Organic computing—Addressing complexity by controlled self-organization, in Proc. Second International Symposium on Leveraging Applications of Formal Methods, Verification and Validation, Paphos, Cyprus, 2006, pp. 185–191.
DOI
[77]
H. A. Simon, Bounded rationality, in Utility and probability, J. Eatwell, M. Milgate, and P. Newman, eds. London, UK: Palgrave Macmillan, 1990, pp. 15–18.
DOI
[78]
S. Tomforde and B. Sick, eds, Organic Computing: Doctoral Dissertation Colloquium 2018. Kassel, Germany: Kassel University Press, 2019.
[79]
S. Rudolph, S. Tomforde, B. Sick, H. Heck, A. Wacker, and J. Hähner, An online influence detection algorithm for organic computing systems, in Proc. ARCS 2015—28th International Conference on Architecture of Computing Systems, Porto, Portugal, 2015, pp. 1–8.
[80]
S. Reichhuber and S. Tomlorde, Opportunistic meta-learning: A case study for quality assurance in industry 4.0 environments, in Proc. 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), Washington, DC, USA, 2020, pp. 76–81.
DOI
[81]
C. Wu, K. Chowdhury, M. D. Felice, and W. Meleis, Spectrum management of cognitive radio using multi-agent reinforcement learning, in Proc. 9th Int. Conf. Autonomous Agents and Multiagent Systems: Industry Track, Toronto, Canada, 2010, pp. 1705–1712.
[82]

Y. Sun, M. Peng, Y. Zhou, Y. Huang, and S. Mao, Application of machine learning in wireless networks: Key techniques and open issues, IEEE Communications Surveys &Tutorials, vol. 21, no. 4, pp. 3072–3108, 2019.

[83]
F. E. Dorner, Measuring progress in deep reinforcement learning sample efficiency, arXiv preprint arXiv: 2102.04881, 2021.
[84]
A. Kuhnle, M. Aroca-Ouellette, A. Basu, M. Sensoy, J. Reid, and D. Zhang, Reinforcement learning for information retrieval, in Proc. 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, 2021, pp. 2669–2672.
DOI
[85]
K. Kurzer, P. Schörner, A. Albers, H. Thomsen, K. Daaboul, and J. M. Zöllner, Generalizing decision making for automated driving with an invariant environment representation using deep reinforcement learning, arXiv preprint arXiv: 2102.06765, 2021.
DOI
[86]
M. Zhou, Z. Wan, H. Wang, M. Wen, R. Wu, Y. Wen, Y. Yang, W. Zhang, and J. Wang, MALib: A parallel framework for population-based multi-agent reinforcement learning, arXiv preprint arXiv: 2106.07551, 2021.
[87]
S. Huang and S. Ontañón, Action guidance: Getting the best of sparse rewards and shaped rewards for real-time strategy games, arXiv preprint arXiv: 2010.03956, 2020.
[88]
M. Riedmiller, R. Hafner, T. Lampe, M. Neunert, J. Degrave, T. Wiele, V. Mnih, N. Heess, and J. T. Springenberg, Learning by playing solving sparse reward tasks from scratch, in Proc. 35th International Conference on Machine Learning, Stockholm Sweden, 2018, pp. 4344–4353.
[89]
G. Dulac-Arnold, D. Mankowitz, and T. Hester, Challenges of real-world reinforcement learning, arXiv preprint arXiv: 1904.12901, 2019.
[90]

C. F. Hayes, R. Rădulescu, E. Bargiacchi, J. Källström, M. Macfarlane, M. Reymond, T. Verstraeten, L. M. Zintgraf, R. Dazeley, F. Heintz, et al., A practical guide to multi-objective reinforcement learning and planning, Autonomous Agents and Multi-Agent Systems, vol. 36, no. 1, p. 26, 2022.

[91]
M. Beeks, R. Refaei Afshar, Y. Zhang, R. Dijkman, C. van Dorst, and S. de Looijer, Deep reinforcement learning for a multi-objective online order batching problem, Proceedings of the International Conference on Automated Planning and Scheduling, vol. 32, no. 1, pp. 435–443, 2022.
DOI
[92]

W. Wang, A. Kwasinski, D. Niyato, and Z. Han, A survey on applications of model-free strategy learning in cognitive wireless networks, IEEE Communications Surveys &Tutorials, vol. 18, no. 3, pp. 1717–1757, 2016.

Publication history
Copyright
Rights and permissions

Publication history

Received: 09 February 2023
Accepted: 31 March 2023
Published: 20 March 2023
Issue date: March 2023

Copyright

© All articles included in the journal are copyrighted to the ITU and TUP.

Rights and permissions

This work is available under the CC BY-NC-ND 3.0 IGO license:https://creativecommons.org/licenses/by-nc-nd/3.0/igo/

Return