| Sign up

PDF (1.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Show Outline

Figures (7)

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Tables (1)

Table 1

Review | Open Access

State of the Art of Adaptive Dynamic Programming and Reinforcement Learning

Derong Liu^{¹^,²}(), Mingming Ha^³, Shan Xue^⁴

1Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen 518055, China

2Department of Electrical and Computer Engineering, University of Illinois at Chicago, IL 606071, USA

3School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China

4School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China

Show Author Information

Abstract

This article introduces the state-of-the-art development of adaptive dynamic programming and reinforcement learning (ADPRL). First, algorithms in reinforcement learning (RL) are introduced and their roots in dynamic programming are illustrated. Adaptive dynamic programming (ADP) is then introduced following a brief discussion of dynamic programming. Researchers in ADP and RL have enjoyed the fast developments of the past decade from algorithms, to convergence and optimality analyses, and to stability results. Several key steps in the recent theoretical developments of ADPRL are mentioned with some future perspectives. In particular, convergence and optimality results of value iteration and policy iteration are reviewed, followed by an introduction to the most recent results on stability analysis of value iteration algorithms.

Keywords

adaptive dynamic programming approximate dynamic programming adaptive critic designs neuro-dynamic programming neural dynamic programming reinforcement learning intelligent control learning control optimal control

References

[1]

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, vol. 518, no. 7540, pp. 529–533, 2015.

Crossref Google Scholar

[2]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol. 529, no. 7587, pp. 484–489, 2016.

Crossref Google Scholar

[3]

D. Silver, J. L. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., Mastering the game of Go without human knowledge, Nature, vol. 550, no. 7676, pp. 354–359, 2017.

Crossref Google Scholar

[4]

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, et al., A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play, Science, vol. 362, no. 6419, pp. 1140–1144, 2018.

Crossref Google Scholar

[5]

M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castañeda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, et al., Human-level performance in 3D multiplayer games with population-based reinforcement learning, Science, vol. 364, no. 6443, pp. 859–865, 2019.

Crossref Google Scholar

[6]

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, vol. 575, no. 7782, pp. 350–354, 2019.

Crossref Google Scholar

[7]

H. Bastani, K. Drakopoulos, V. Gupta, I. Vlachogiannis, C. Hadjichristodoulou, P. Lagiou, G. Magiorkinis, D. Paraskevis, and S. Tsiodras, Efficient and targeted COVID-19 border testing via reinforcement learning, Nature, vol. 599, no. 7883, pp. 108–113, 2021.

Crossref Google Scholar

[8]

L. C. Garaffa, M. Basso, A. A. Konzen, and E. P. De Freitas, Reinforcement learning for mobile robotics exploration: A survey, IEEE Trans. Neural Netw Learn. Syst., doi: 10.1109/TNNLS.2021.3124466.

[9]

P. Leinen, M. Esders, K. T. Schütt, C. Wagner, K. R. Müller, and F. S. Tautz, Autonomous robotic nanofabrication with reinforcement learning, Sci. Adv., vol. 6, no. 36, p. eabb6987, 2020.

Crossref Google Scholar

[10]

W. Zhu, X. Guo, D. Owaki, K. Kutsuzawa, and M. Hayashibe, A survey of sim-to-real transfer techniques applied to reinforcement learning for bioinspired robots, IEEE Trans. Neural Netw. Learn. Syst., doi: 10.1109/TNNLS.2021.3112718.

[11]

B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Pérez, Deep reinforcement learning for autonomous driving: A survey, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 6, pp. 4909–4926, 2022.

Crossref Google Scholar

[12]

S. Aradi, Survey of deep reinforcement learning for motion planning of autonomous vehicles, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 2, pp. 740–759, 2022.

Crossref Google Scholar

[13]

J. Degrave, F. Felici, J. Buchli, M. Neunert, B. Tracey, F. Carpanese, T. Ewalds, R. Hafner, A. Abdolmaleki, D. De Las casas, et al., Magnetic control of tokamak plasmas through deep reinforcement learning, Nature, vol. 602, no. 7897, pp. 414–419, 2022.

Crossref Google Scholar

[14]

R. Hafner and M. Riedmiller, Reinforcement learning in feedback control, Mach. Learn., vol. 84, no. 1, pp. 137–169, 2011.

[15]

D. Liu, Y. Xu, Q. Wei, and X. Liu, Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming, IEEE/CAA J. Autom. Sin., vol. 5, no. 1, pp. 36–46, 2018.

Crossref Google Scholar

[16]

D. Wang, M. Ha, and J. Qiao, Data-driven iterative adaptive critic control toward an urban wastewater treatment plant, IEEE Trans. Ind. Electron., vol. 68, no. 8, pp. 7362–7369, 2021.

Crossref Google Scholar

[17]

Y. Zhao, Y. Ma, and S. Hu, USV formation and path-following control via deep reinforcement learning with random braking, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5468–5478, 2021.

Crossref Google Scholar

[18]

D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic Programming With Applications in Optimal Control. Cham, Switzerland: Springer, 2017.

[19]

D. Liu, X. Yang, D. Wang, and Q. Wei, Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints, IEEE Trans. Cybern., vol. 45, no. 7, pp. 1372–1385, 2015.

Crossref Google Scholar

[20]

S. Xue, B. Luo, and D. Liu, Event-triggered adaptive dynamic programming for unmatched uncertain nonlinear continuous-time systems, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 7, pp. 2939–2951, 2021.

Crossref Google Scholar

[21]

Z. Yan and Y. Xu, Real-time optimal power flow: A Lagrangian based deep reinforcement learning approach, IEEE Trans. Power Syst., vol. 35, no. 4, pp. 3270–3273, 2020.

Crossref Google Scholar

[22]

N. Wang, Y. Gao, and X. Zhang, Data-driven performance-prescribed reinforcement learning control of an unmanned surface vehicle, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5456–5467, 2021.

Crossref Google Scholar

[23]

D. Liu, D. Wang, and H. Li, Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach, IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 418–428, 2014.

Crossref Google Scholar

[24]

S. Xue, B. Luo, D. Liu, and Y. Gao, Event-triggered ADP for tracking control of partially unknown constrained uncertain systems, IEEE Trans. Cybern., vol. 52, no. 9, pp. 9001–9012, 2022.

Crossref Google Scholar

[25]

J. Tromp, Number of legal go positions, http://tromp.github.io/go/legal.html, 2021.

[26]

F. Y. Wang, J. J. Zhang, X. Zheng, X. Wang, Y. Yuan, X. Dai, J. Zhang, and L. Yang, Where does AlphaGo go: From church-Turing thesis to AlphaGo thesis and beyond, IEEE/CAA J. Autom. Sin., vol. 3, no. 2, pp. 113–120, 2016.

Crossref Google Scholar

[27]

G. Tesauro, Practical issues in temporal difference learning, Mach. Learn., vol. 8, no. 3, pp. 257–277, 1992.

[28]

G. Tesauro, TD-gammon, a self-teaching backgammon program, achieves master-level play, Neural Comput., vol. 6, no. 2, pp. 215–219, 1994.

Crossref Google Scholar

[29]

J. Schaeffer, J. Culberson, N. Treloar, B. Knight, P. Lu, and D. Szafron, A world championship caliber checkers program, Artif. Intell., vol. 53, nos. 2–3, pp. 273–289, 1992.

[30]

M. Buro, From simple features to sophisticated evaluation functions, in Proc. 1^st Int. Conf. Computers and Games, Tsukuba, Japan, 1998, pp. 126–145.

[31]

M. Campbell, A. J. Hoane, and F. H. Hsu, Deep blue, Artif. Intell., vol. 134, no. 1-2, pp. 57–83, 2002.

Crossref Google Scholar

[32]

C. Moyer, How Google’s AlphaGo beat a Go world champion, https://www.theatlantic.com/technology/archive/2016/03/the-invisible-opponent/475611/, 2016.

[33]

S. Byford, AlphaGo retires from competitive Go after defeating world number one 3–0, https://www.theverge.com/2017/5/27/15704088/alphago-ke-jie-game-3-result-retires-future, 2017.

[34]

S. Shead, Google DeepMind is edging towards a 3-0 victory against world Go champion Ke Jie, https://www.businessinsider.nl/google-deepmind-edges-towards-ke-jie-victory-2017-5/, 2017.

[35]

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, 2013.

Crossref Google Scholar

[36]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436–444, 2015.

Crossref Google Scholar

[37]

J. Schmidhuber, Deep learning in neural networks: An overview, Neural Netw., vol. 61, pp. 85–117, 2015.

Crossref Google Scholar

[38]

X. Cai and D. C. Wunsch, A parallel computer-Go player, using HDP method, in Proc. Int. Joint Conf. Neural Networks, Washington, DC, USA, 2001, pp. 2373–2375.

[39]

N. N. Schraudolph, P. Dayan, and T. J. Sejnowski, Temporal difference learning of position evaluation in the game of Go, in Proc. 6^th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 1993, pp. 817–824.

[40]

D. Silver, R. S. Sutton, and M. Müller, Temporal-difference search in computer Go, Mach. Learn., vol. 87, no. 2, pp. 183–219, 2012.

Crossref Google Scholar

[41]

R. Zaman, D. Prokhorov, and D. C. Wunsch, Adaptive critic design in learning to play game of Go, in Proc. Int. Conf. Neural Networks, Houston, TX, USA, 1997, pp. 1–4.

[42]

R. Zaman and D. C. Wunsch, TD methods applied to mixture of experts for learning 9×9 Go evaluation function, in Proc. Int. Joint Conf. Neural Networks, Washington, DC, USA, 1999, pp. 3734–3739.

[43]

R. Coulom, Computing ELO ratings of move patterns in the game of Go, ICGA J., vol. 30, no. 4, pp. 198–208, 2007.

Crossref Google Scholar

[44]

M. Enzenberger, Evaluation in Go by a neural network using soft segmentation, in Proc. 10^th Int. Conf. Advances in Computer Games, Graz, Austria, 2003, pp. 97–108.

[45]

C. Clark and A. Storkey, Training deep convolutional neural networks to play Go, in Proc. 32^nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1766–1774.

[46]

C. J. Maddison, A. Huang, I. Sutskever, and D. Silver, Move evaluation in Go using deep convolutional neural networks, in Proc. 3^rd Int. Conf. Learning Representations, San Diego, CA, USA, 2015.

[47]

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 1998.

[48]

A. G. Barto, Reinforcement learning and adaptive critic methods, in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold, 1992, pp. 469–491.

[49]

F. L. Lewis and D. Vrabie, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits Syst. Mag., vol. 9, no. 3, pp. 32–50, 2009.

Crossref Google Scholar

[50]

F. L. Lewis, D. Liu, and G. G. Lendaris, Guest editorial-Special issue on adaptive dynamic programming and reinforcement learning in feedback control, IEEE Trans. Syst. Man Cybern. B: Cybern, vol. 38, no. 4, pp. 896–897, 2008.

Crossref Google Scholar

[51]

D. Liu, F. L. Lewis, and Q. Wei, Editorial special issue on adaptive dynamic programming and reinforcement learning, IEEE Trans. Syst. Man Cybern. Syst., vol. 50, no. 11, pp. 3944–3947, 2020.

Crossref Google Scholar

[52]

L. Buşoniu, T. De Bruin, D. Tolić, J. Kober, and I. Palunko, Reinforcement learning for control: Performance, stability, and deep approximators, Annu. Rev. Control, vol. 46, pp. 8–28, 2018.

Crossref Google Scholar

[53]

B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, Optimal and autonomous control using reinforcement learning: A survey, IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 6, pp. 2042–2062, 2018.

Crossref Google Scholar

[54]

D. Liu, S. Xue, B. Zhao, B. Luo, and Q. Wei, Adaptive dynamic programming for control: A survey and recent advances, IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 1, pp. 142–160, 2021.

Crossref Google Scholar

[55]

D. Wang, M. Ha, and M. Zhao, The intelligent critic framework for advanced optimal control, Artif. Intell. Rev., vol. 55, no. 1, pp. 1–22, 2022.

Crossref Google Scholar

[56]

R. S. Sutton, Learning to predict by the methods of temporal differences, Mach. Learn., vol. 3, no. 1, pp. 9–44, 1988.

[57]

C. J. C. H. Watkins, Learning from delayed rewards, PhD dissertation, Cambridge Univ., Cambridge, UK, 1989.

[58]

C. J. C. H. Watkins and P. Dayan, Q-learning, Mach. Learn., vol. 8, no. 3, pp. 279–292, 1992.

[59]

A. Gosavi, Reinforcement learning: A tutorial survey and recent advances, INFORMS J. Comput., vol. 21, no. 2, pp. 178–192, 2009.

Crossref Google Scholar

[60]

L. P. Kaelbling, M. L. Littman, and A. W. Moore, Reinforcement learning: A survey, J. Artif. Intell. Res., vol. 4, pp. 237–285, 1996.

Crossref Google Scholar

[61]

G. A. Rummery and M. Niranjan, On-Line Q-Learning Using Connectionist Systems, http://mi.eng.cam.ac.uk/reports/svr-ftp/auto-pdf/rummery_tr166.pdf, 1994.

[62]

R. S. Sutton, Generalization in reinforcement learning: Successful examples using sparse coarse coding, in Proc. 8^th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 1995, pp. 1038–1044.

[63]

R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1957.

[64]

D. P. Bertsekas, Dynamic Programming and Optimal Control. 3rd ed. Belmont, MA, USA: Athena Scientific, 2005.

[65]

S. E. Dreyfus and A. M. Law, The Art and Theory of Dynamic Programming. New York, NY, USA: Academic Press, 1977.

[66]

F. L. Lewis and V. L. Syrmos, Optimal Control. New York, NY, USA: Wiley, 1995.

[67]

M. C. Weinstein and R. J. Zeckhauser, The optimal consumption of depletable natural resources, Quart. J. Econ., vol. 89, no. 3, pp. 371–392, 1975.

Crossref Google Scholar

[68]

S. G. Papachristos, Adaptive dynamic programming in inventory control, PhD dissertation, The University of Manchester, Manchester, UK, 1977.

[69]

S. Papachristos, Note-A note on the dynamic inventory problem with unknown demand distribution, Manage. Sci., vol. 23, no. 11, pp. 1248–1251, 1977.

[70]

S. Shields, A review of fault detection methods for large systems, Radio Electron. Eng., vol. 46, no. 6, pp. 276–280, 1976.

Crossref Google Scholar

[71]

A. G. Barto, S. J. Bradtke, and S. P. Singh, Learning to act using real-time dynamic programming, Artif. Intell., vol. 72, no. 1-2, pp. 81–138, 1995.

Crossref Google Scholar

[72]

J. J. Murray, C. J. Cox, and R. E. Saeks, The adaptive dynamic programming theorem, in Stability and Control of Dynamical Systems with Applications, D. Liu and P. J. Antsaklis, Eds. Boston, MA USA: Birkhäuser, 2003, pp. 379–394.

[73]

W. H. Hausman and L. J. Thomas, Inventory control with probabilistic demand and periodic withdrawals, Manage. Sci., vol. 18, no. 5-part-1, pp. 265–275, 1972.

[74]

P. J. Werbos, Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research, IEEE Trans. Syst. Man Cybern., vol. 17, no. 1, pp. 7–20, 1987.

Crossref Google Scholar

[75]

P. J. Werbos, Advanced forecasting methods for global crisis warning and models of intelligence, Gen. Syst., vol. 22, pp. 25–38, 1977.

[76]

P. J. Werbos, A menu of designs for reinforcement learning over time, in Neural Networks for Control, W. T. Miller, R. S. Sutton, and P. J. Werbos, Eds. Cambridge, MA, USA: MIT Press, 1990, pp. 67–95.

[77]

P. J. Werbos, Consistency of HDP applied to a simple reinforcement learning problem, Neural Netw., vol. 3, no. 2, pp. 179–189, 1990.

Crossref Google Scholar

[78]

P. J. Werbos, Approximate dynamic programming for real-time control and neural modeling, in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold, 1992, pp.493–525.

[79]

D. V. Prokhorov, R. A. Santiago, and D. C. Wunsch, Adaptive critic designs: A case study for neurocontrol, Neural Netw., vol. 8, no. 9, pp. 1367–1372, 1995.

Crossref Google Scholar

[80]

D. V. Prokhorov and D. C. Wunsch, Adaptive critic designs, IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997–1007, 1997.

Crossref Google Scholar

[81]

A. Al-Tamimi, M. Abu-Khalaf, and F. L. Lewis, Adaptive critic designs for discrete-time zero-sum games with application to H_∞ control, IEEE Trans. Syst. Man Cybern. B: Cybern., vol. 37, no. 1, pp. 240–247, 2007.

Crossref Google Scholar

[82]

S. N. Balakrishnan and V. Biega, Adaptive-critic-based neural networks for aircraft optimal control, J. Guid. Control Dyn., vol. 19, no. 4, pp. 893–898, 1996.

Crossref Google Scholar

[83]

G. K. Venayagamoorthy, R. G. Harley, and D. C. Wunsch, Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neurocontrol of a turbogenerator, IEEE Trans. Neural Netw., vol. 13, no. 3, pp. 764–773, 2002.

Crossref Google Scholar

[84]

C. Cox, S. Stepniewski, C. Jorgensen, R. Saeks, and C. Lewis, On the design of a neural network autolander, Int. J. Robust Nonlinear Control, vol. 9, no. 14, pp. 1071–1096, 1999.

Crossref Google Scholar

[85]

J. Dalton and S. N. Balakrishnan, A neighboring optimal adaptive critic for missile guidance, Math. Comput. Modell., vol. 23, nos. 1–2, pp. 175–188, 1996.

[86]

D. Liu, H. Javaherian, O. Kovalenko, and T. Huang, Adaptive critic learning techniques for engine torque and air-fuel ratio control, IEEE Trans. Syst. Man Cybern. B: Cybern., vol. 38, no. 4, pp. 988–993, 2008.

Crossref Google Scholar

[87]

N. V. Kulkarni and K. KrishnaKumar, Intelligent engine control using an adaptive critic, IEEE Trans. Control Syst. Technol., vol. 11, no. 2, pp. 164–173, 2003.

Crossref Google Scholar

[88]

D. Liu, Y. Zhang, and H. Zhang, A self-learning call admission control scheme for CDMA cellular networks, IEEE Trans. Neural Netw., vol. 16, no. 5, pp. 1219–1228, 2005.

Crossref Google Scholar

[89]

J. Si, L. Yang, and D. Liu, Direct neural dynamic programming, in Handbook of Learning and Approximate Dynamic Programming, J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley, 2004, pp. 125–151.

[90]

S. Chakraborty and M. G. Simoes, Neural dynamic programming based online controller with a novel trim approach, IEE Proc. Control Theory Appl., vol. 152, no. 1, pp. 95–104, 2005.

Crossref Google Scholar

[91]

D. Liu and H. Zhang, A neural dynamic programming approach for learning control of failure avoidance problems, Int. J. Intell. Control Syst., vol. 10, no. 1, pp. 21–32, 2005.

[92]

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic programming: An overview, in Proc. 34^th IEEE Conf. Decision and Control, New Orleans, LA, USA, 1995, pp. 560–564.

[93]

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA, USA: Athena Scientific, 1996.

[94]

D. P. Bertsekas, M. L. Homer, D. A. Logan, S. D. Patek, and N. R. Sandell, Missile defense and interceptor allocation by neuro-dynamic programming, IEEE Trans. Syst. Man Cybern. A: Syst. Hum., vol. 30, no. 1, pp. 42–51, 2000.

Crossref Google Scholar

[95]

P. Marbach, O. Mihatsch, and J. N. Tsitsiklis, Call admission control and routing in integrated services networks using neuro-dynamic programming, IEEE J. Select. Areas Commun., vol. 18, no. 2, pp. 197–208, 2000.

Crossref Google Scholar

[96]

D. Wang, C. Mu, H. He, and D. Liu, Event-driven adaptive robust control of nonlinear systems with uncertainties through NDP strategy, IEEE Trans. Syst. Man Cybern. Syst., vol. 47, no. 7, pp. 1358–1370, 2017.

Crossref Google Scholar

[97]

C. Mu, D. Wang, and H. He, Novel iterative neural dynamic programming for data-based approximate optimal control design, Automatica, vol. 81, pp. 240–252, 2017.

Crossref Google Scholar

[98]

M. Aoki, On optimal and suboptimal policies in the choice of control forces for final-value systems, IRE Trans. Autom. Control, vol. 5, no. 3, pp. 171–178, 1960.

Crossref Google Scholar

[99]

R. Durbeck, An approximation technique for suboptimal control, IEEE Trans. Autom. Control, vol. 10, no. 2, pp. 144–149, 1965.

Crossref Google Scholar

[100]

R. J. Leake and R. W. Liu, Construction of suboptimal control sequences, SIAM J. Control, vol. 5, no. 1, pp. 54–63, 1967.

Crossref Google Scholar

[101]

F. Y. Wang and G. N. Saridis, Suboptimal control for nonlinear stochastic systems, in Proc. 31^st IEEE Conf. Decision and Control, Tucson, AZ, USA, 1992, pp. 1856–1861.

[102]

G. N. Saridis and F. Y. Wang, Suboptimal control of nonlinear stochastic systems, Control Theory Adv. Technol., vol. 10, no. 4, pp. 847–871, 1994.

[103]

P. Werbos, ADP: Goals, opportunities and principles, in Handbook of Learning and Approximate Dynamic Programming, J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley, 2004, pp. 3–44.

[104]

W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality. New York, NY, USA: Wiley, 2007.

[105]

P. J. Werbos, Using ADP to understand and replicate brain intelligence: The next level design, in Proc. IEEE Int. Symp. Approximate Dynamic Programming and Reinforcement Learning, Honolulu, HI, USA, 2007, pp. 209–216.

[106]

P. J. Werbos, Foreword - ADP: The key direction for future research in intelligent control and understanding brain intelligence, IEEE Trans. Syst. Man Cybern. B: Cybern., vol. 38, no. 4, pp. 898–900, 2008.

Crossref Google Scholar

[107]

X. Bai, D. Zhao, and J. Yi, Coordinated multiple ramps metering based on neuro-fuzzy adaptive dynamic programming, in Proc. Int. Joint Conf. Neural Networks, Atlanta, GA, USA, 2009, pp. 241–248.

[108]

Y. Zhu, D. Zhao, and H. He, Integration of fuzzy controller with adaptive dynamic programming, in Proc. 10^th World Congress on Intelligent Control and Automation, Beijing, China, 2012, pp. 310–315.

[109]

H. Zhang, J. Zhang, G. H. Yang, and Y. Luo, Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming, IEEE Trans. Fuzzy Syst., vol. 23, no. 1, pp. 152–163, 2015.

Crossref Google Scholar

[110]

R. E. Saeks, C. J. Cox, K. Mathia, and A. J. Maren, Asymptotic dynamic programming: Preliminary concepts and results, in Proc. IEEE Int. Conf. Neural Networks, Houston, TX, USA, 1997, pp. 2273–2278.

[111]

S. Haykin, Neural Networks and Learning Machines, 3rd ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2009.

[112]

J. M. Zurada, Introduction to Artificial Neural Systems. St. Paul, MN, USA: West, 1992.

[113]

D. Liu, X. Xiong, and Y. Zhang, Action-dependent adaptive critic designs, in Proc. Int. Joint Conf. Neural Networks, Washington, DC, USA, 2001, pp. 990–995.

[114]

G. G. Lendaris and C. Paintz, Training strategies for critic and action neural networks in dual heuristic programming method, in Proc. IEEE Int. Conf. Neural Networks, Houston, TX, USA, 1997, pp. 712–717.

[115]

J. Si and Y. T. Wang, Online learning control by association and reinforcement, IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264–276, 2001.

Crossref Google Scholar

[116]

H. He, Z. Ni, and J. Fu, A three-network architecture for on-line learning and optimization based on adaptive dynamic programming, Neurocomputing, vol. 78, no. 1, pp. 3–13, 2012.

Crossref Google Scholar

[117]

R. Padhi, N. Unnikrishnan, X. Wang, and S. N. Balakrishnan, A single network adaptive critic (SNAC) architecture for optimal control synthesis for a class of nonlinear systems, Neural Netw., vol. 19, no. 10, pp. 1648–1660, 2006.

Crossref Google Scholar

[118]

B. Lincoln and A. Rantzer, Relaxing dynamic programming, IEEE Trans. Autom. Control, vol. 51, no. 8, pp. 1249–1260, 2006.

Crossref Google Scholar

[119]

A. Rantzer, Relaxed dynamic programming in switching systems, IEE Proc. Control Theory Appl., vol. 153, no. 5, pp. 567–574, 2006.

Crossref Google Scholar

[120]

A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof, IEEE Trans. Syst. Man Cybern. B: Cybern., vol. 38, no. 4, pp. 943–949, 2008.

Crossref Google Scholar

[121]

F. Y. Wang, N. Jin, D. Liu, and Q. Wei, Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound, IEEE Trans. Neural Netw., vol. 22, no. 1, pp. 24–36, 2011.

Crossref Google Scholar

[122]

D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming, IEEE Trans. Autom. Sci. Eng., vol. 9, no. 3, pp. 628–634, 2012.

Crossref Google Scholar

[123]

D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming, Automatica, vol. 48, no. 8, pp. 1825–1832, 2012.

Crossref Google Scholar

[124]

D. Liu, D. Wang, and X. Yang, An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs, Inf. Sci., vol. 220, pp. 331–342, 2013.

Crossref Google Scholar

[125]

D. Liu and Q. Wei, Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems, IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 3, pp. 621–634, 2014.

Crossref Google Scholar

[126]

D. Liu, Q. Wei, and P. Yan, Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems, IEEE Trans. Syst. Man Cybern. Syst., vol. 45, no. 12, pp. 1577–1591, 2015.

Crossref Google Scholar

[127]

Q. Wei and D. Liu, A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems, IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 1176–1190, 2014.

Crossref Google Scholar

[128]

Q. Zhao, H. Xu, and S. Jagannathan, Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning, IEEE/CAA J. Autom. Sin., vol. 1, no. 4, pp. 372–384, 2014.

Crossref Google Scholar

[129]

X. Zhong, H. He, H. Zhang, and Z. Wang, Optimal control for unknown discrete-time nonlinear Markov jump systems using adaptive dynamic programming, IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 12, pp. 2141–2155, 2014.

Crossref Google Scholar

[130]

A. Sahoo, H. Xu, and S. Jagannathan, Near optimal event-triggered control of nonlinear discrete-time systems using neurodynamic programming, IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 9, pp. 1801–1815, 2016.

Crossref Google Scholar

[131]

H. Li and D. Liu, Optimal control for discrete-time affine non-linear systems using general value iteration, IET Control Theory Appl., vol. 6, no. 18, pp. 2725–2736, 2012.

Crossref Google Scholar

[132]

D. Liu, H. Li, and D. Wang, Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems, IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 6, pp. 1323–1334, 2015.

Crossref Google Scholar

[133]

D. Liu and Q. Wei, Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems, IEEE Trans. Cybern., vol. 43, no. 2, pp. 779–789, 2013.

Crossref Google Scholar

[134]

Q. Wei and D. Liu, Numerical adaptive learning control scheme for discrete-time non-linear systems, IET Control Theory Appl., vol. 7, no. 11, pp. 1472–1486, 2013.

Crossref Google Scholar

[135]

Q. Wei and D. Liu, Stable iterative adaptive dynamic programming algorithm with approximation errors for discrete-time nonlinear systems, Neural Comput. Appl., vol. 24, no. 6, pp. 1355–1367, 2014.

Crossref Google Scholar

[136]

Q. Wei, D. Liu, and X. Yang, Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems, IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 4, pp. 866–879, 2015.

Crossref Google Scholar

[137]

Q. Wei, D. Liu, and Y. Xu, Neuro-optimal tracking control for a class of discrete-time nonlinear systems via generalized value iteration adaptive dynamic programming approach, Soft Comput., vol. 20, no. 2, pp. 697–706, 2016.

Crossref Google Scholar

[138]

Q. Wei, F. Y. Wang, D. Liu, and X. Yang, Finite-approximation-error-based discrete-time iterative adaptive dynamic programming, IEEE Trans. Cybern., vol. 44, no. 12, pp. 2820–2833, 2014.

Crossref Google Scholar

[139]

P. Yan, D. Wang, H. Li, and D. Liu, Error bound analysis of Q-function for discounted optimal control problems with policy iteration, IEEE Trans. Syst. Man Cybern. Syst., vol. 47, no. 7, pp. 1207–1216, 2017.

Crossref Google Scholar

[140]

B. Luo, Y. Yang, H. N. Wu, and T. Huang, Balancing value iteration and policy iteration for discrete-time control, IEEE Trans. Syst. Man Cybern. Syst., vol. 50, no. 11, pp. 3948–3958, 2020.

Crossref Google Scholar

[141]

M. Ha, D. Wang, and D. Liu, A novel value iteration scheme with adjustable convergence rate, IEEE Trans. Neural Netw. Learn. Syst., doi: 10.1109/TNNLS.2022.3143527.

[142]

Y. Zhu, D. Zhao, and X. Li, Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data, IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 3, pp. 714–725, 2017.

Crossref Google Scholar

[143]

C. Li, D. Liu, and D. Wang, Data-based optimal control for weakly coupled nonlinear systems using policy iteration, IEEE Trans. Syst. Man Cybern. Syst., vol. 48, no. 4, pp. 511–521, 2018.

Crossref Google Scholar

[144]

H. Zhang, Y. Liu, G. Xiao, and H. Jiang, Data-based adaptive dynamic programming for a class of discrete-time systems with multiple delays, IEEE Trans. Syst. Man Cybern. Syst., vol. 50, no. 2, pp. 432–441, 2020.

Crossref Google Scholar

[145]

N. Lin, R. Chi, and B. Huang, Event-triggered model-free adaptive control, IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 6, pp. 3358–3369, 2021.

Crossref Google Scholar

[146]

B. Luo, Y. Yang, and D. Liu, Policy iteration Q-learning for data-based two-player zero-sum game of linear discrete-time systems, IEEE Trans. Cybern., vol. 51, no. 7, pp. 3630–3640, 2021.

Crossref Google Scholar

[147]

Q. Wei, L. Zhu, R. Song, P. Zhang, D. Liu, and J. Xiao, Model-free adaptive optimal control for unknown nonlinear multiplayer nonzero-sum game, IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 2, pp. 879–892, 2022.

Crossref Google Scholar

[148]

M. Farjadnasab and M. Babazadeh, Model-free LQR design by Q-function learning, Automatica, vol. 137, p. 110060, 2022.

Crossref Google Scholar

[149]

C. Mu, D. Wang, and H. He, Data-driven finite-horizon approximate optimal control for discrete-time nonlinear systems using iterative HDP approach, IEEE Trans. Cybern., vol. 48, no. 10, pp. 2948–2961, 2018.

Crossref Google Scholar

[150]

S. Al-Dabooni and D. C. Wunsch, An improved N-step value gradient learning adaptive dynamic programming algorithm for online learning, IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 4, pp. 1155–1169, 2020.

Crossref Google Scholar

[151]

B. Luo, D. Liu, H. N. Wu, D. Wang, and F. L. Lewis, Policy gradient adaptive dynamic programming for data-based optimal control, IEEE Trans. Cybern., vol. 47, no. 10, pp. 3341–3354, 2017.

Crossref Google Scholar

[152]

J. Ye, Y. Bian, B. Luo, M. Hu, B. Xu, and R. Ding, Costate-supplement ADP for model-free optimal control of discrete-time nonlinear systems, IEEE Trans. Neural Netw. Learn. Syst., doi: 10.1109/TNNLS.2022.3172126.

[153]

Y. Li, Z. Hou, Y. Feng, and R. Chi, Data-driven approximate value iteration with optimality error bound analysis, Automatica, vol. 78, pp. 79–87, 2017.

Crossref Google Scholar

[154]

Y. Li, C. Yang, Z. Hou, Y. Feng, and C. Yin, Data-driven approximate Q-learning stabilization with optimality error bound analysis, Automatica, vol. 103, pp. 435–442, 2019.

Crossref Google Scholar

[155]

H. Zhang, K. Zhang, Y. Cai, and J. Han, Adaptive fuzzy fault-tolerant tracking control for partially unknown systems with actuator faults via integral reinforcement learning method, IEEE Trans. Fuzzy Syst., vol. 27, no. 10, pp. 1986–1998, 2019.

Crossref Google Scholar

[156]

Y. Cao, Y. Song, and C. Wen, Practical tracking control of perturbed uncertain nonaffine systems with full state constraints, Automatica, vol. 110, p. 08608, 2019.

[157]

C. Chen, H. Modares, K. Xie, F. L. Lewis, Y. Wan, and S. Xie, Reinforcement learning-based adaptive optimal exponential tracking control of linear systems with unknown dynamics, IEEE Trans. Autom. Control, vol. 64, no. 11, pp. 4423–4438, 2019.

Crossref Google Scholar

[158]

M. Ha, D. Wang, and D. Liu, Data-based nonaffine optimal tracking control using iterative DHP approach, IFAC-PapersOnLine, vol. 53, no. 2, pp. 4246–4251, 2020.

Crossref Google Scholar

[159]

K. Zhang, H. Zhang, Y. Mu, and C. Liu, Decentralized tracking optimization control for partially unknown fuzzy interconnected systems via reinforcement learning method, IEEE Trans. Fuzzy Syst., vol. 29, no. 4, pp. 917–926, 2021.

Crossref Google Scholar

[160]

F. Liu, C. Jiang, and W. Xiao, Multistep prediction-based adaptive dynamic programming sensor scheduling approach for collaborative target tracking in energy harvesting wireless sensor networks, IEEE Trans. Autom. Sci. Eng., vol. 18, no. 2, pp. 693–704, 2021.

Crossref Google Scholar

[161]

H. Dong, X. Zhao, and B. Luo, Optimal tracking control for uncertain nonlinear systems with prescribed performance via critic-only ADP, IEEE Trans. Syst. Man Cybern. Syst., vol. 52, no. 1, pp. 561–573, 2022.

Crossref Google Scholar

[162]

B. Luo, D. Liu, T. Huang, and J. Liu, Output tracking control based on adaptive dynamic programming with multistep policy evaluation, IEEE Trans. Syst. Man Cybern. Syst., vol. 49, no. 10, pp. 2155–2165, 2019.

Crossref Google Scholar

[163]

C. Li, J. Ding, F. L. Lewis, and T. Chai, A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems, Automatica, vol. 129, p. 109687, 2021.

Crossref Google Scholar

[164]

M. Ha, D. Wang, and D. Liu, Discounted iterative adaptive critic designs with novel stability analysis for tracking control, IEEE/CAA J. Autom. Sin., vol. 9, no. 7, pp. 1262–1272, 2022.

Crossref Google Scholar

[165]

W. Xue, P. Kolaric, J. Fan, B. Lian, T. Chai, and F. L. Lewis, Inverse reinforcement learning in tracking control based on inverse optimal control, IEEE Trans. Cybern., vol. 52, no. 10, pp. 10570–10581, 2022.

Crossref Google Scholar

[166]

W. Zhang, K. Song, X. Rong, and Y. Li, Coarse-to-fine UAV target tracking with deep reinforcement learning, IEEE Trans. Autom. Sci. Eng., vol. 16, no. 4, pp. 1522–1530, 2019.

Crossref Google Scholar

[167]

Y. Hu, W. Wang, H. Liu, and L. Liu, Reinforcement learning tracking control for robotic manipulator with kernel-based dynamic model, IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 9, pp. 3570–3578, 2020.

Crossref Google Scholar

[168]

R. Wu, Z. Yao, J. Si, and H. H. Huang, Robotic knee tracking control to mimic the intact human knee profile based on actor-critic reinforcement learning, IEEE/CAA J. Autom. Sin., vol. 9, no. 1, pp. 19–30, 2022.

Crossref Google Scholar

[169]

S. Cao, L. Sun, J. Jiang, and Z. Zuo, Reinforcement learning-based fixed-time trajectory tracking control for uncertain robotic manipulators with input saturation, IEEE Trans. Neural Netw. Learn. Syst., doi: 10.1109/TNNLS.2021.3116713.

[170]

N. Wang, Y. Gao, H. Zhao, and C. K. Ahn, Reinforcement learning-based optimal tracking control of an unknown unmanned surface vehicle, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 7, pp. 3034–3045, 2021.

Crossref Google Scholar

[171]

S. A. A. Rizvi, A. J. Pertzborn, and Z. Lin, Reinforcement learning based optimal tracking control under unmeasurable disturbances with application to HVAC systems, IEEE Trans. Neural Netw. Learn. Syst., doi: 10.1109/TNNLS.2021.3085358.

[172]

S. Song, M. Zhu, X. Dai, and D. Gong, Model-free optimal tracking control of nonlinear input-affine discrete-time systems via an iterative deterministic Q-learning algorithm, IEEE Trans. Neural Netw. Learn. Syst., doi: 10.1109/TNNLS.2022.3178746.

[173]

M. Lin, B. Zhao, and D. Liu, Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay, IEEE Trans. Syst. Man Cybern. Syst., vol. 52, no. 6, pp. 3692–3703, 2022.

Crossref Google Scholar

[174]

S. Li, P. Durdevic, and Z. Yang, Model-free H_∞ tracking control for de-oiling hydrocyclone systems via off-policy reinforcement learning, Automatica, vol. 133, p. 109862, 2021.

Crossref Google Scholar

[175]

W. Sun, X. Wang, and C. Zhang, A model-free control strategy for vehicle lateral stability with adaptive dynamic programming, IEEE Trans. Ind. Electron., vol. 67, no. 12, pp. 10693–10701, 2020.

Crossref Google Scholar

[176]

A. Heydari, Stability analysis of optimal adaptive control under value iteration using a stabilizing initial policy, IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 9, pp. 4522–4527, 2018.

Crossref Google Scholar

[177]

A. Heydari, Stability analysis of optimal adaptive control using value iteration with approximation errors, IEEE Trans. Autom. Control, vol. 63, no. 9, pp. 3119–3126, 2018.

Crossref Google Scholar

[178]

Q. Wei, D. Liu, and H. Lin, Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems, IEEE Trans. Cybern., vol. 46, no. 3, pp. 840–853, 2016.

Crossref Google Scholar

[179]

A. Heydari, Theoretical and numerical analysis of approximate dynamic programming with approximation errors, J. Guid., Control Dyn., vol. 39, no. 2, pp. 301–311, 2016.

Crossref Google Scholar

[180]

M. Ha, D. Wang, and D. Liu, Offline and online adaptive critic control designs with stability guarantee through value iteration, IEEE Trans. Cybern., doi: 10.1109/TCYB.2021.3107801.

[181]

M. Ha, D. Wang, and D. Liu, Generalized value iteration for discounted optimal control with stability analysis, Syst. Control Lett., vol. 147, p. 104847, 2021.

Crossref Google Scholar

[182]

K. G. Vamvoudakis and F. L. Lewis, Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica, vol. 46, no. 5, pp. 878–888, 2010.

Crossref Google Scholar

[183]

M. Ha, D. Wang, and D. Liu, Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee, Neural Netw., vol. 144, pp. 176–186, 2021.

Crossref Google Scholar

[184]

S. Al-Dabooni and D. C. Wunsch, Online model-free n-step HDP with stability analysis, IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 4, pp. 1255–1269, 2020.

Crossref Google Scholar

[185]

H. Zhang, C. Qin, B. Jiang, and Y. Luo, Online adaptive policy learning algorithm for H_∞ state feedback control of unknown affine nonlinear discrete-time systems, IEEE Trans. Cybern., vol. 44, no. 12, pp. 2706–2718, 2014.

Crossref Google Scholar

[186]

Y. Sokolov, R. Kozma, L. D. Werbos, and P. J. Werbos, Complete stability analysis of a heuristic approximate dynamic programming control design, Automatica, vol. 59, pp. 9–18, 2015.

Crossref Google Scholar

[187]

S. Al-Dabooni and D. Wunsch, The boundedness conditions for model-free HDP(λ), IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 7, pp. 1928–1942, 2019.

Crossref Google Scholar

[188]

J. W. Kim, T. H. Oh, S. H. Son, D. H. Jeong, and J. M. Lee, Convergence analysis of the deep neural networks based globalized dual heuristic programming, Automatica, vol. 122, p. 109222, 2020.

Crossref Google Scholar

[189]

R. A. Santiago and P. Werbos, New progress towards truly brain-like intelligent control, in Proc. World Congress on Neural Networks, San Diego, CA, 1994, pp. 27–33.

[190]

P. J. Werbos, Intelligence in the brain: A theory of how it works and how to build it, Neural Netw., vol. 22, no. 3, pp. 200–212, 2009.

Crossref Google Scholar

CAAI Artificial Intelligence Research

Volume 1 Issue 2,
December 2022

Pages 93-110

DOI: 10.26599/AIR.2022.9150007

Cite this article:

Liu D, Ha M, Xue S. State of the Art of Adaptive Dynamic Programming and Reinforcement Learning. CAAI Artificial Intelligence Research, 2022, 1(2): 93-110. https://doi.org/10.26599/AIR.2022.9150007

Return

Table 1Comparison between Markov decision process and optimal control for discrete-time nonlinear systems.

Term	Markov decision process	Discrete-time nonlinear system
Dynamics	$Pr (a_{t} \| s_{t}) = π (a_{t} \| s_{t})$	$x_{k + 1} = F (x_{k}, u_{k})$
Immediate reward/Utility function	$r_{t + 1} = r (s_{t}, a_{t}, s_{t + 1})$	$U (x_{k}, u_{k})$
Return/Performance index	$G_{t} = \sum_{i = t}^{\infty} γ^{i - t} r_{i + 1}$	$J (x_{k}, {\underline{u}}_{k}) = \sum_{i = k}^{\infty} γ^{i - k} U (x_{i}, u_{i})$
Policy/Feedback control	$π (a \| s)$	$μ (x)$

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号