Study on the application of reinforcement learning in the operation optimization of HVAC system

Xiaolei Yuan; Yiqun Pan; Jianrong Yang; Weitong Wang; Zhizhong Huang

doi:10.1007/s12273-020-0602-9

Building Simulation 2021, 14(1): 75-87 https://doi.org/10.1007/s12273-020-0602-9

Research Article | Issue | Published: 23 March 2020

Study on the application of reinforcement learning in the operation optimization of HVAC system

Show Author's Information Hide Author's Information Xiaolei Yuan^¹, Yiqun Pan^¹(

), Jianrong Yang^², Weitong Wang^³, Zhizhong Huang^⁴

1School of Mechanical Engineering, Tongji University, 4800 Cao’an Road, Shanghai 201804, China

2Shanghai Research Institute of Building Sciences, Shanghai, China

3Kuaishou Co. Ltd., Beijing, China

4Sino-German College of Applied Sciences, Tongji University, Shanghai 201804, China

Keywords:

reinforcement learning, energy saving, HVAC system, control strategy, operation optimization, VAV system

Cite this article:

Yuan X, Pan Y, Yang J, et al. Study on the application of reinforcement learning in the operation optimization of HVAC system. Building Simulation, 2021, 14(1): 75-87. https://doi.org/10.1007/s12273-020-0602-9

Download citation

EndNote(RIS)

BibTeX

552

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Supervisory control can be used to optimize the HVAC system operation and achieve building energy conservation, while reinforcement learning (RL) is considered as a promising model-free supervisory control method. In this paper, we apply RL algorithm to the operation optimization of air-conditioning (AC) system and propose an innovative RL-based model-free control strategy combining rule-based and RL-based control algorithm as well as complete application process. We use a variable air volume (VAV) air-conditioning system for a single-storey office building as a case study to validate the optimization performance of the RL-based controller. We select control strategies with the rule-based control controller (RBC) and proportional-integral-derivative (PID) controller respectively as the reference cases. The results show that, for the air supply of single zone, the RL controller performs the best in terms of both non-comfortable time and energy costs of AC system after one-year exploration learning. The total energy consumption of AC system reduced by 7.7% and 4.7%, respectively compared with RBC and PID strategies. For the air supply of multi-zone, the performance of RL controller begins to outperform the reference strategies after two-year exploration learning and two-year buffer stage. From the seventh year on, RL controller performs much better in terms of both non-comfortable time and operating costs of AC system, while the operating cost of AC system is reduced by 2.7% to 4.6% compared with the reference strategies. In addition, RL controller is more suitable for small-scale operation optimization problems.

Full text

Abstract

Full text

Outline

About this article

Study on the application of reinforcement learning in the operation optimization of HVAC system

Show Author's information Hide Author's Information Xiaolei Yuan^¹, Yiqun Pan^¹(

), Jianrong Yang^², Weitong Wang^³, Zhizhong Huang^⁴

1School of Mechanical Engineering, Tongji University, 4800 Cao’an Road, Shanghai 201804, China

2Shanghai Research Institute of Building Sciences, Shanghai, China

3Kuaishou Co. Ltd., Beijing, China

4Sino-German College of Applied Sciences, Tongji University, Shanghai 201804, China

Abstract

Keywords: reinforcement learning, energy saving, HVAC system, control strategy, operation optimization, VAV system

References(46)

Baird L (1995). Residual algorithms: Reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Machine Learning, Miami, FL, USA.

DOI

Baldi S, Michailidis I, Ravanis C, Kosmatopoulos EB (2015). Model- based and model-free “plug-and-play” building energy efficient control. Applied Energy, 154: 829-841.

DOI Google Scholar

Barrett E, Linder S (2015). Autonomous HVAC control: A reinforcement learning approach. In: Bifet A. et al. (eds), Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science, vol 9286. Cham, Switzerland: Springer.

Brémaud P (1999). Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. New York: Springer.

DOI

Cheng Z, Zhao Q, Wang F, Jiang Y, Xia L, Ding J (2016). Satisfaction based Q-learning for integrated lighting and blind control. Energy and Buildings, 127: 43-55.

DOI Google Scholar

Costa A, Keane MM, Torrens JI, Corry E (2013). Building operation and energy performance: Monitoring, analysis and optimisation toolkit. Applied Energy, 101: 310-316.

DOI Google Scholar

Costanzo GT, Iacovella S, Ruelens F, Leurs T, Claessens BJ (2016). Experimental analysis of data-driven control for a building heating system. Sustainable Energy, Grids and Networks, 6: 81-90.

DOI Google Scholar

Curtiss PS, Brandemuehl MJ, Kreider JF (1994). Energy management in central HVAC plants using neural networks. ASHRAE Transactions, 100(1): 476-493.

Google Scholar

Defazio A, Graepel T (2014). A comparison of learning algorithms on the arcade learning environment. arXiv:1410.8620

DOE (2011). Building Energy Data Book. US Department of Energy. Available at http://buildingsdatabook.eren.doe.gov/.

Doll BB, Bath KG, Daw ND, Frank MJ (2016). Variability in dopamine genes dissociates model-based and model-free reinforcement learning. Journal of Neuroscience, 36: 1211-1222.

DOI Google Scholar

Dong B, O’Neill Z, Luo D, Bailey T (2014). Development and calibration of an online energy model for campus buildings. Energy and Buildings, 76: 316-327.

DOI Google Scholar

Dong B, Yan D, Li Z, Jin Y, Feng X, Fontenot H (2018). Modeling occupancy and behavior for better building design and operation—A critical review. Building Simulation, 11: 899-921.

DOI Google Scholar

Ernst D, Geurts P, Wehenkel PL (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6: 503-556.

Google Scholar

Gao Y, Chen S, Lu X (2004). A review of reinforcement learning. Journal of Automation, 30(1): 86-100. (in Chinese)

Google Scholar

Goyal S, Ingley HA, Barooah P (2013). Occupancy-based zone-climate control for energy-efficient buildings: Complexity vs. performance. Applied Energy, 106: 209-221.

DOI Google Scholar

Gunay HB, Ouf M, Newsham G, O’Brien W (2019). Sensitivity analysis and optimization of building operations. Energy and Buildings, 199: 164-175.

DOI Google Scholar

Halperin I (2019). The QLBS Q-learner goes NuQLear: Fitted Q iteration, inverse RL, and option portfolios. Quantitative Finance, 19: 1543-1553

DOI Google Scholar

Han M, May R, Zhang X, Wang X, Pan S, Yan D, Jin Y, Xu L (2019). A review of reinforcement learning methodologies for controlling occupant comfort in buildings. Sustainable Cities and Society, 51: 101748.

DOI Google Scholar

House JM, Smith TF (1995). System approach to optimal control for HVAC and building systems. ASHRAE Transactions, 101(2): 647-660.

Google Scholar

Huang X (2017). Optimal control based on experience replay and Q-Learning. Computer Engineering and Design, 38(5): 1352-1355. (in Chinese)

Google Scholar

Jaafra Y, Laurent JL, Deruyver A, Naceur MS (2019). Reinforcement learning for neural architecture search: A review. Image and Vision Computing, 89: 57-66.

DOI Google Scholar

Jung W, Jazizadeh F (2019). Human-in-the-loop HVAC operations: A quantitative review on occupancy, comfort, and energy-efficiency dimensions. Applied Energy, 239: 1471-1508.

DOI Google Scholar

Killian M, Kozek M (2016). Ten questions concerning model predictive control for energy efficient buildings. Building and Environment, 105: 403-412.

DOI Google Scholar

Lange S. Gabel ST, Riedmiller M (2012). Batch reinforcement learning. In: Wiering M, van Otterlo M (eds), Reinforcement Learning. Berlin: Springer. pp. 45-73.

DOI

Li J, Poulton G, Platt G, Wall J, James G (2010). Dynamic zone modelling for HVAC system control. International Journal of Modelling, Identification and Control, 9: 5-14.

DOI Google Scholar

Li B, Xia L (2015). A multi-grid reinforcement learning method for energy conservation and comfort of HVAC in buildings. In: Proceedings of IEEE International Conference on Automation Science and Engineering, Gothenburg, Sweden.

DOI

Ling KV, Dexter AL (1994). Expert control of air-conditioning plant. Automatica, 30: 761-773.

DOI Google Scholar

Liu S, Henze GP (2006). Experimental analysis of simulated reinforcement learning control for active and passive building thermal storage inventory: Part 1. Theoretical foundation. Energy and Buildings, 38: 142-147.

DOI Google Scholar

Mason K, Grijalva S (2019). A review of reinforcement learning for autonomous building energy management. Computers & Electrical Engineering, 78: 300-312.

DOI Google Scholar

Mbuwir BV, Ruelens F, Spiessens F, Deconinck G (2017). Battery energy management in a microgrid using batch reinforcement learning. Energies, 10: 1846.

DOI Google Scholar

MOHURD (2012). Design code for heating Ventilation and air conditioning of civil buildings (GB50736-2012). Ministry of Housing and Urban-rural Development of China. (in Chinese)

Nguyen ND, Nguyen T, Nahavandi S (2019). Multi-agent behavioral control system using deep reinforcement learning. Neurocomputing, 359: 58-68.

DOI Google Scholar

Niu F, O’Neill Z, O’Neill C (2018). Data-driven based estimation of HVAC energy consumption using an improved Fourier series decomposition in buildings. Building Simulation, 11: 633-645.

DOI Google Scholar

Ruelens F, Iacovella S, Claessens BJ, Belmans R (2015). Learning agent for a heat-pump thermostat with a set-back strategy using model-free reinforcement learning. Energies, 8: 8300-8318.

DOI Google Scholar

Russek EM, Momennejad I, Botvinick MM, Gershman SJ, Daw ND (2017). Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Computational Biology, 3(9): e1005768

DOI Google Scholar

Široký J, Oldewurtel F, Cigler J, Prívara S (2011). Experimental analysis of model predictive control for an energy efficient building heating system. Applied Energy, 88: 3079-3087.

DOI Google Scholar

Sutton RS, Barto AG (1998). Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, .

DOI

TRNSYS (2017). Transient System Simulation (TRNSYS) Program Documentation.

Urieli D, Stone P (2013). A learning agent for heat-pump thermostat control. In: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Saint Paul, MN, USA.

van Hasselt H (2010). Double Q-Learning. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems.

Wang S, Ma Z (2008). Supervisory and optimal control of building HVAC systems: A review. HVAC&R Research, 14: 3-32.

DOI Google Scholar

Watkins CJCH, Dayan P (1992). Q-learning. Machine Learning, 8: 279-292.

DOI Google Scholar

Wei T, Wang Y, Zhu Q (2017). Deep reinforcement learning for building HVAC Control. In: Proceedings of the 54th Annual Design Automation Conference, Austin, TX, USA.

DOI

You C, Lu J, Filev D, Tsiotras P (2019). Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robotics and Autonomous Systems, 114: 1-18.

DOI Google Scholar

Zhao H, Magoulès F (2012). A review on the prediction of building energy consumption. Renewable and Sustainable Energy Reviews, 16: 3586-3592.

DOI Google Scholar

About this article

Publication history

Acknowledgements

Publication history

Received: 30 August 2019

Accepted: 17 December 2019

Published: 23 March 2020

Issue date: February 2021

Copyright

Acknowledgements

This study is supported by the Thirteenth Five-Year National Key Research and Development Program "Study on the Technical Standard System for Post-evaluation of Green Building Performance", Ministry of Science and Technology of China (No. 2016YFC0700105).