| Sign up

PDF (5.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Adaptive cache policy optimization through deep reinforcement learning in dynamic cellular networks

Ashvin Srinivasan^¹(), Mohsen Amidzadeh^¹, Junshan Zhang^², Olav Tirkkonen^¹

1Department of Information and Communications Engineering, Aalto University, Espoo 02150, Finland

2Department of Electrical and Computer Engineering, University of California, Davis, CA 95616, USA

Show Author Information

Abstract

We explore the use of caching both at the network edge and within User Equipment (UE) to alleviate traffic load of wireless networks. We develop a joint cache placement and delivery policy that maximizes the Quality of Service (QoS) while simultaneously minimizing backhaul load and UE power consumption, in the presence of an unknown time-variant file popularity. With file requests in a time slot being affected by download success in the previous slot, the caching system becomes a non-stationary Partial Observable Markov Decision Process (POMDP). We solve the problem in a deep reinforcement learning framework based on the Advantageous Actor-Critic (A2C) algorithm, comparing Feed Forward Neural Networks (FFNN) with a Long Short-Term Memory (LSTM) approach specifically designed to exploit the correlation of file popularity distribution across time slots. Simulation results show that using LSTM-based A2C outperforms FFNN-based A2C in terms of sample efficiency and optimality, demonstrating superior performance for the non-stationary POMDP problem. For caching at the UEs, we provide a distributed algorithm that reaches the objectives dictated by the agent controlling the network, with minimum energy consumption at the UEs, and minimum communication overhead.

Keywords

wireless caching deep reinforcement learning advantageous actor critic long short term memory non-stationary Partial Observable Markov Decision Process (POMDP)

References

[1]

P. Hassanzadeh, A. M. Tulino, J. Llorca, and E. Erkip, Rate-memory trade-off for caching and delivery of correlated sources, IEEE Trans. Inf. Theory, vol. 66, no. 4, pp. 2219–2251, 2020.

Crossref Google Scholar

[2]

H. Wu, Y. Fan, Y. Wang, H. Ma, and L. Xing, A comprehensive review on edge caching from the perspective of total process: Placement, policy and delivery, Sensors, vol. 21, no. 15, p. 5033, 2021.

Crossref Google Scholar

[3]

B. Serbetci and J. Goseling, On optimal geographical caching in heterogeneous cellular networks, in Proc. IEEE Wireless Communications and Networking Conf. (WCNC), San Francisco, CA, USA, 2017, pp. 1–6.

[4]

B. Blaszczyszyn and A. Giovanidis, Optimal geographic caching in cellular networks, in Proc. IEEE Int. Conf. Communications (ICC), London, UK, 2015, pp. 3358–3363.

[5]

Y. Chen, M. Ding, J. Li, Z. Lin, G. Mao, and L. Hanzo, Probabilistic small-cell caching: Performance analysis and optimization, IEEE Trans. Veh. Technol., vol. 66, no. 5, pp. 4341–4354, 2017.

[6]

X. Xu and M. Tao, Modeling, analysis, and optimization of caching in multi-antenna small-cell networks, IEEE Trans. Wirel. Commun., vol. 18, no. 11, pp. 5454–5469, 2019.

Crossref Google Scholar

[7]

M. Choi, A. F. Molisch, D. J. Han, D. Kim, J. Kim, and J. Moon, Probabilistic caching and dynamic delivery policies for categorized contents and consecutive user demands, IEEE Trans. Wirel. Commun., vol. 20, no. 4, pp. 2685–2699, 2021.

Crossref Google Scholar

[8]

J. Wu, B. Chen, C. Yang, and Q. Li, Caching and bandwidth allocation policy optimization in heterogeneous networks, in Proc. IEEE 28th Annual Int. Symp. on Personal, Indoor, and Mobile Radio Communications (PIMRC), Montreal, Canada, 2017, pp. 1–6.

[9]

J. Wen, K. Huang, S. Yang, and V. O. K. Li, Cache-enabled heterogeneous cellular networks: Optimal tier-level content placement, IEEE Trans. Wirel. Commun., vol. 16, no. 9, pp. 5939–5952, 2017.

Crossref Google Scholar

[10]

K. Li, C. Yang, Z. Chen, and M. Tao, Optimization and analysis of probabilistic caching in N-tier heterogeneous networks, IEEE Trans. Wirel. Commun., vol. 17, no. 2, pp. 1283–1297, 2018.

Crossref Google Scholar

[11]

J. Wu, C. Yang, and B. Chen, Proactive caching and bandwidth allocation in heterogenous networks by learning from historical numbers of requests, IEEE Trans. Commun., vol. 68, no. 7, pp. 4394–4410, 2020.

Crossref Google Scholar

[12]

Z. Wang, Z. Cao, Y. Cui, and Y. Yang, Joint and competitive caching designs in large-scale multi-tier wireless multicasting networks, in Proc. GLOBECOM 2017—2017 IEEE Global Communications Conf., Singapore, 2017, pp. 1–7.

[13]

Y. Cui and D. Jiang, Analysis and optimization of caching and multicasting in large-scale cache-enabled heterogeneous wireless networks, IEEE Trans. Wirel. Commun., vol. 16, no. 1, pp. 250–264, 2017.

Crossref Google Scholar

[14]

C. Ye, Y. Cui, Y. Yang, and R. Wang, Optimal caching designs for perfect, imperfect, and unknown file popularity distributions in large-scale multi-tier wireless networks, IEEE Trans. Commun., vol. 67, no. 9, pp. 6612–6625, 2019.

Crossref Google Scholar

[15]

M. Bayat, R. K. Mungara, and G. Caire, Achieving spatial scalability for coded caching via coded multipoint multicasting, IEEE Trans. Wirel. Commun., vol. 18, no. 1, pp. 227–240, 2019.

Crossref Google Scholar

[16]

X. Peng, Y. Shi, J. Zhang, and K. B. Letaief, Layered Group sparse beamforming for cache-enabled green wireless networks, IEEE Trans. Commun., vol. 65, no. 12, pp. 5589–5603, 2017.

Crossref Google Scholar

[17]

W. Sun, Y. Li, C. Hu, and M. Peng, Joint optimization of cache placement and bandwidth allocation in heterogeneous networks, IEEE Access, vol. 6, pp. 37250–37260, 2018.

Crossref Google Scholar

[18]

F. Zhou, L. Fan, N. Wang, G. Luo, J. Tang, and W. Chen, A cache-aided communication scheme for downlink coordinated multipoint transmission, IEEE Access, vol. 6, pp. 1416–1427, 2018.

Crossref Google Scholar

[19]

M. Amidzadeh, H. Al-Tous, G. Caire, and O. Tirkkonen, Caching in cellular networks based on multipoint multicast transmissions, IEEE Trans. Wirel. Commun., vol. 22, no. 4, pp. 2393–2408, 2023.

Crossref Google Scholar

[20]

Y. Wei, Z. Zhang, F. R. Yu, and Z. Han, Joint user scheduling and content caching strategy for mobile edge networks using deep reinforcement learning, in Proc. IEEE Int. Conf. Communications Workshops (ICC Workshops), Kansas City, MO, USA, 2018, pp. 1–6.

[21]

D. Li, Y. Han, C. Wang, G. Shi, X. Wang, X. Li, and V. C. M. Leung, Deep reinforcement learning for cooperative edge caching in future mobile networks, in Proc. IEEE Wireless Communications and Networking Conf. (WCNC), Marrakesh, Morocco, 2019, pp. 1–6.

[22]

R. Li, C. Wang, Z. Zhao, R. Guo, and H. Zhang, The LSTM-based advantage actor-critic learning for resource management in network slicing with user mobility, IEEE Commun. Lett., vol. 24, no. 9, pp. 2005–2009, 2020.

Crossref Google Scholar

[23]

Z. Zhang and M. Tao, Deep learning for wireless coded caching with unknown and time-variant content popularity, IEEE Trans. Wirel. Commun., vol. 20, no. 2, pp. 1152–1163, 2021.

Crossref Google Scholar

[24]

M. Amidzadeh, H. Al-Tous, O. Tirkkonen, and J. Zhang, Joint cache placement and delivery design using reinforcement learning for cellular networks, in Proc. IEEE 93rd Vehicular Technology Conf. (VTC2021-Spring), Helsinki, Finland, 2021, pp. 1–6.

[25]

T. Ni, B. Eysenbach, and R. Salakhutdinov, Recurrent model-free RL is a strong baseline for many POMDPs, arXiv preprint arXiv: 2110.05038, 2021.

[26]

J. G. Andrews, A. K. Gupta, and H. S. Dhillon, A primer on cellular network analysis using stochastic geometry, arXiv preprint arXiv: 1604.03183, 2016.

[27]

M. Chiang, Networked Life. Cambridge, UK: Cambridge University Press, 2012.

[28]

L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker, Web caching and Zipf-like distributions: Evidence and implications, in Proc. IEEE Annual Joint Conference : INFOCOM, IEEE Computer and Communications Societies, New York, NY, USA, 1999, pp. 126–134.

[29]

M. Kumar, R. Rout, and D. Somayajulu, Cooperative cache update using multi-agent recurrent deep reinforcement learning for mobile edge networks, Comput. Netw., vol. 209, p. 108876, 2022.

Crossref Google Scholar

[30]

E. Paluzo-Hidalgo, R. Gonzalez-Diaz, and M. A. Gutiérrez-Naranjo, Two-hidden-layer feed-forward networks are universal approximators: A constructive approach, Neural Netw., vol. 131, pp. 29–36, 2020.

Crossref Google Scholar

[31]

A. Baisero and C. Amato, Unbiased asymmetric reinforcement learning under partial observability, arXiv preprint arXiv: 2105.11674v2, 2022.

[32]

S. Nath and J. Wu, Deep reinforcement learning for dynamic computation offloading and resource allocation in cache-assisted mobile edge computing systems, Intelligent and Converged Networks, vol. 1, no. 2, pp. 181–198, 2020.

Crossref Google Scholar

[33]

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

Crossref Google Scholar

[34]

I. Grondman, L. Busoniu, G. A. D. Lopes, and R. Babuska, A survey of actor-critic reinforcement learning: Standard and natural policy gradients, IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 42, no. 6, pp. 1291–1307, 2012.

Crossref Google Scholar

[35]

3GPP, UMTS Universal Mobile Telecommunications System, RF system scenarios (3GPP TR 25.942 version 14.0. 0), Tech. Rep. ETSI TR 125 942, 3GPP, 2017.

Intelligent and Converged Networks

Volume 5 Issue 2,
June 2024

Pages 81-99

DOI: 10.23919/ICN.2024.0007

Cite this article:

Srinivasan A, Amidzadeh M, Zhang J, et al. Adaptive cache policy optimization through deep reinforcement learning in dynamic cellular networks. Intelligent and Converged Networks, 2024, 5(2): 81-99. https://doi.org/10.23919/ICN.2024.0007