Journal Home > Volume 8 , Issue 5

The joint optimization problem of energy procurement and retail pricing for an electricity retailer is converted into separately determining the optimal procurement strategy and optimal pricing strategy, under the "price-taker" assumption. The aggregate energy consumption of end use customers (EUCs) is predicted to solve for the optimal procurement strategy vis a long short-term memory (LSTM)-based supervised learning method. The optimal retail pricing problem is formulated as a Markov decision process (MDP), which can be solved by using deep reinforcement learning (DRL) algorithms. However, the performance of existing DRL approaches may deteriorate due to their insufficient ability to extract discriminative features from the time-series vectors in the environmental states. We propose a novel deep deterministic policy gradient (DDPG) network structure with a shared LSTM-based representation network that fully exploits the Actor’s and Critic’s losses. The designed shared representation network and the joint loss function can enhance the environment perception capability of the proposed approach and further improve the optimization performance, resulting in a more profitable pricing strategy. Numerical simulations demonstrate the effectiveness of the proposed approach.


menu
Abstract
Full text
Outline
About this article

Energy Procurement and Retail Pricing for Electricity Retailers via Deep Reinforcement Learning with Long Short-term Memory

Show Author's information Hongsheng Xu ( )Jinyu WenQinran HuJiao ShuJixiang LuZhihong Yang
State Key Laboratory of Advanced Electromagnetic Engineering and Technology, School of Electrical and Electronic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
NARI Group Corporation, Nanjing 211106, China
School of Electrical Engineering, Southeast University, Nanjing 210096, China
State Key Laboratory of Smart Grid Protection and Control, Nanjing 211106, China

Abstract

The joint optimization problem of energy procurement and retail pricing for an electricity retailer is converted into separately determining the optimal procurement strategy and optimal pricing strategy, under the "price-taker" assumption. The aggregate energy consumption of end use customers (EUCs) is predicted to solve for the optimal procurement strategy vis a long short-term memory (LSTM)-based supervised learning method. The optimal retail pricing problem is formulated as a Markov decision process (MDP), which can be solved by using deep reinforcement learning (DRL) algorithms. However, the performance of existing DRL approaches may deteriorate due to their insufficient ability to extract discriminative features from the time-series vectors in the environmental states. We propose a novel deep deterministic policy gradient (DDPG) network structure with a shared LSTM-based representation network that fully exploits the Actor’s and Critic’s losses. The designed shared representation network and the joint loss function can enhance the environment perception capability of the proposed approach and further improve the optimization performance, resulting in a more profitable pricing strategy. Numerical simulations demonstrate the effectiveness of the proposed approach.

Keywords: Deep reinforcement learning, long short-term memory, electricity market, energy procurement, retail pricing

References(40)

[1]
J. J. Yang, J. H. Zhao, F. J. Luo, F. S. Wen, and Z. Y. Dong, “Decision-making for electricity retailers: A brief survey,” IEEE Transactions on Smart Grid, vol. 9, no. 5, pp. 4140–4153, Sep. 2018.
[2]
F. L. Meng and X. J. Zeng, “A profit maximization approach to demand response management with customers behavior learning in smart grid,” IEEE Transactions on Smart Grid, vol. 7, no. 3, pp. 1516–1529, May 2016.
[3]
J. S. Vardakas, N. Zorba, and C. V. Verikoukis, “A survey on demand response programs in smart grids: Pricing methods and optimization algorithms,” IEEE Communications Surveys & Tutorials, vol. 17, no. 1, pp. 152–178, Jan./Mar. 2015.
[4]
Z. Chen, L. Wu, and Y. Fu, “Real-time price-based demand response management for residential appliances via stochastic optimization and robust optimization,” IEEE Transactions on Smart Grid, vol. 3, no. 4, pp. 1822–1831, Dec. 2012.
[5]
M. Muratori and G. Rizzoni, “Residential demand response: Dynamic energy management and time-varying electricity pricing,” IEEE Transactions on Power Systems, vol. 31, no. 2, pp. 1108–1117, Mar. 2016.
[6]
Y. Wang, Q. X. Chen, T. Hong, and C. Q. Kang, “Review of smart meter data analytics: Applications, methodologies, and challenges,” IEEE Transactions on Smart Grid, vol. 10, no. 3, pp. 3125–3148, May 2019.
[7]
H. C. Xu, K. Q. Zhang, and J. B. Zhang, “Optimal joint bidding and pricing of profit-seeking load serving entity,” IEEE Transactions on Power Systems, vol. 33, no. 5, pp. 5427–5436, Sep. 2018.
[8]
X. Fang, Q. R. Hu, F. X. Li, B. B. Wang, and Y. Li, “Coupon-based demand response considering wind power uncertainty: A strategic bidding model for load serving entities,” IEEE Transactions on Power Systems, vol. 31, no. 2, pp. 1025–1037, Mar. 2016.
[9]
A. Sadeghi-Mobarakeh and H. Mohsenian-Rad, “Optimal bidding in performance-based regulation markets: An MPEC analysis with system dynamics,” IEEE Transactions on Power Systems, vol. 32, no. 2, pp. 1282–1292, Mar. 2017.
[10]
N. Li, L. J. Chen, and M. A. Dahleh, “Demand response using linear supply function bidding,” IEEE Transactions on Smart Grid, vol. 6, no. 4, pp. 1827–1838, Jul. 2015.
[11]
S. Maharjan, Q. Y. Zhu, Y. Zhang, S. Gjessing, and T. Basar, “Dependable demand response management in the smart grid: A Stackelberg game approach,” IEEE Transactions on Smart Grid, vol. 4, no. 1, pp. 120–132, Mar. 2013.
[12]
M. Jin, W. Feng, C. Marnay, and C. Spanos, “Microgrid to enable optimal distributed energy retail and end-user demand response,” Applied Energy, vol. 210, pp. 1321–1335, Jan. 2018.
[13]
H. C. Xu, H. B. Sun, D. Nikovski, S. Kitamura, K. Mori, and H. Hashimoto, “Deep reinforcement learning for joint bidding and pricing of load serving entity,” IEEE Transactions on Smart Grid, vol. 10, no. 6, pp. 6366–6375, Nov. 2019.
[14]
Y. J. Ye, D. W. Qiu, M. Y. Sun, D. Papadaskalopoulos, and G. Strbac, “Deep reinforcement learning for strategic bidding in electricity markets,” IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1343–1355, Mar. 2020.
[15]
M. R. Salehizadeh and S. Soltaniyan, “Application of fuzzy Q-learning for electricity market modeling by considering renewable power penetration,” Renewable and Sustainable Energy Reviews, vol. 56, pp. 1172–1181, Apr. 2016.
[16]
Y. K. Liu, D. X. Zhang, and H. B. Gooi, “Data-driven decision-making strategies for electricity retailers: A deep reinforcement learning approach,” CSEE Journal of Power and Energy Systems, vol. 7, no. 2, pp. 358–367, Mar. 2021.
[17]
R. Z. Lu, S. H. Hong, and X. F. Zhang, “A Dynamic pricing demand response algorithm for smart grid: Reinforcement learning approach,” Applied Energy, vol. 220, pp. 220–230, Jun. 2018.
[18]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., Cambridge, MA, USA: MIT Press, 2018.
[19]
D. X. Zhang, X. Q. Han, and C. Y. Deng, “Review on the research and practice of deep learning and reinforcement learning in smart grids,” CSEE Journal of Power and Energy Systems, vol. 4, no. 3, pp. 362–370, Sep. 2018.
[20]
Z. D. Zhang, D. X. Zhang, and R. C. Qiu, “Deep reinforcement learning for power system applications: An overview,” CSEE Journal of Power and Energy Systems, vol. 6, no. 1, pp. 213–225, Mar. 2020.
[21]
B. G. Kim, Y. Zhang, M. van der Schaar, and J. W. Lee, “Dynamic pricing and energy consumption scheduling with reinforcement learning,” IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2187–2198, Sep. 2016.
[22]
L. L. Wen, K. L. Zhou, J. Li, and S. Y. Wang, “Modified deep learning and reinforcement learning for an incentive-based demand response model,” Energy, vol. 205, pp. 118019, Aug. 2020.
[23]
X. Y. Kong, D. Q. Kong, J. T. Yao, L. Q. Bai, and J. Xiao, “Online pricing of demand response based on long short-term memory and reinforcement learning,” Applied Energy, vol. 271, pp. 114945, Aug. 2020.
[24]
A. G. Barto and S. Mahadevan, “Recent advances in hierarchical reinforcement learning,” Discrete Event Dynamic Systems, vol. 13, no. 4, pp. 341–379, Oct. 2003.
[25]
K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,” IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, Nov. 2017.
[26]
R. Z. Lu and S. H. Hong, “Incentive-based demand response for smart grid with reinforcement learning and deep neural network,” Applied Energy, vol. 236, pp. 937–949, Feb. 2019.
[27]
K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, “LSTM: A search space Odyssey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 28, no. 10, pp. 2222–2232, Oct. 2017.
[28]
R. S. Sutton, D. A. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, CO, USA, 2000, pp. 1057-1063.
[29]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[30]
A. Graves, A. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 2013, pp. 6645-6649.
[31]
W. C. Kong, Z. Y. Dong, Y. W. Jia, D. J. Hill, Y. Xu, and Y. Zhang, “Short-term residential load forecasting based on LSTM recurrent neural network,” IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 841–851, Jan. 2019.
[32]
L. Peng, S. Liu, R. Liu, and L. Wang, “Effective long short-term memory with differential evolution algorithm for electricity price prediction,” Energy, vol. 162, pp. 1301–1314, 2018.
[33]
C. Olah. Understanding LSTM networks. [Online]. Available: http://colah.github.io/posts/2015–08-Understanding-LSTMs
[34]
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015.
[35]
T. P. Lillicrap J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in Proceedings of the 4th International Conference Learning Represent (ICLR), San Juan, Puerto Rico, 2016, pp. 1-14.
[36]
R. Caruana, “Multitask learning,” Machine Learning, vol. 28, no. 1, pp. 41–75, Jul. 1997.
[37]
PJM Data Miner 2. [Online]. Available: http://dataminer2.pjm.com.
[38]
M. Fahrioglu and F. L. Alvarado, “Using utility information to calibrate customer demand management behavior models,” IEEE Transactions on Power Systems, vol. 16, no. 2, pp. 317–322, May 2001.
[39]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012, pp. 1097-1105.
[40]
D. P. Kingma and L. J. Ba, “Adam: A method for stochastic optimization,” in Proceedings of the 3rd International Conference on Learning Representations (ICLR), Diego, CA, USA, 2015, pp. 1-15.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 07 June 2021
Revised: 05 November 2021
Accepted: 28 December 2021
Published: 14 February 2022
Issue date: September 2022

Copyright

© 2021 CSEE

Acknowledgements

Acknowledgements

This work was supported in part by Natural Science Foundation of Jiangsu Province (BK20210002) and National Key R&D Program of China (2018AAA0101504).

Rights and permissions

Return