[18]
D. Liu, Q. Wei, D. Wang, X. Yang, and H. Li, Adaptive Dynamic Programming With Applications in Optimal Control. Cham, Switzerland: Springer, 2017.
[29]
J. Schaeffer, J. Culberson, N. Treloar, B. Knight, P. Lu, and D. Szafron, A world championship caliber checkers program, Artif. Intell., vol. 53, nos. 2–3, pp. 273–289, 1992.
[30]
M. Buro, From simple features to sophisticated evaluation functions, in Proc. 1st Int. Conf. Computers and Games, Tsukuba, Japan, 1998, pp. 126–145.
[38]
X. Cai and D. C. Wunsch, A parallel computer-Go player, using HDP method, in Proc. Int. Joint Conf. Neural Networks, Washington, DC, USA, 2001, pp. 2373–2375.
[39]
N. N. Schraudolph, P. Dayan, and T. J. Sejnowski, Temporal difference learning of position evaluation in the game of Go, in Proc. 6th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 1993, pp. 817–824.
[41]
R. Zaman, D. Prokhorov, and D. C. Wunsch, Adaptive critic design in learning to play game of Go, in Proc. Int. Conf. Neural Networks, Houston, TX, USA, 1997, pp. 1–4.
[42]
R. Zaman and D. C. Wunsch, TD methods applied to mixture of experts for learning 9×9 Go evaluation function, in Proc. Int. Joint Conf. Neural Networks, Washington, DC, USA, 1999, pp. 3734–3739.
[44]
M. Enzenberger, Evaluation in Go by a neural network using soft segmentation, in Proc. 10th Int. Conf. Advances in Computer Games, Graz, Austria, 2003, pp. 97–108.
[45]
C. Clark and A. Storkey, Training deep convolutional neural networks to play Go, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1766–1774.
[46]
C. J. Maddison, A. Huang, I. Sutskever, and D. Silver, Move evaluation in Go using deep convolutional neural networks, in Proc. 3rd Int. Conf. Learning Representations, San Diego, CA, USA, 2015.
[47]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA, USA: MIT Press, 1998.
[48]
A. G. Barto, Reinforcement learning and adaptive critic methods, in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold, 1992, pp. 469–491.
[57]
C. J. C. H. Watkins, Learning from delayed rewards, PhD dissertation, Cambridge Univ., Cambridge, UK, 1989.
[62]
R. S. Sutton, Generalization in reinforcement learning: Successful examples using sparse coarse coding, in Proc. 8th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 1995, pp. 1038–1044.
[63]
R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton University Press, 1957.
[64]
D. P. Bertsekas, Dynamic Programming and Optimal Control. 3rd ed. Belmont, MA, USA: Athena Scientific, 2005.
[65]
S. E. Dreyfus and A. M. Law, The Art and Theory of Dynamic Programming. New York, NY, USA: Academic Press, 1977.
[66]
F. L. Lewis and V. L. Syrmos, Optimal Control. New York, NY, USA: Wiley, 1995.
[68]
S. G. Papachristos, Adaptive dynamic programming in inventory control, PhD dissertation, The University of Manchester, Manchester, UK, 1977.
[72]
J. J. Murray, C. J. Cox, and R. E. Saeks, The adaptive dynamic programming theorem, in Stability and Control of Dynamical Systems with Applications, D. Liu and P. J. Antsaklis, Eds. Boston, MA USA: Birkhäuser, 2003, pp. 379–394.
[76]
P. J. Werbos, A menu of designs for reinforcement learning over time, in Neural Networks for Control, W. T. Miller, R. S. Sutton, and P. J. Werbos, Eds. Cambridge, MA, USA: MIT Press, 1990, pp. 67–95.
[78]
P. J. Werbos, Approximate dynamic programming for real-time control and neural modeling, in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold, 1992, pp.493–525.
[85]
J. Dalton and S. N. Balakrishnan, A neighboring optimal adaptive critic for missile guidance, Math. Comput. Modell., vol. 23, nos. 1–2, pp. 175–188, 1996.
[89]
J. Si, L. Yang, and D. Liu, Direct neural dynamic programming, in Handbook of Learning and Approximate Dynamic Programming, J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley, 2004, pp. 125–151.
[92]
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic programming: An overview, in Proc. 34th IEEE Conf. Decision and Control, New Orleans, LA, USA, 1995, pp. 560–564.
[93]
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA, USA: Athena Scientific, 1996.
[101]
F. Y. Wang and G. N. Saridis, Suboptimal control for nonlinear stochastic systems, in Proc. 31st IEEE Conf. Decision and Control, Tucson, AZ, USA, 1992, pp. 1856–1861.
[103]
P. Werbos, ADP: Goals, opportunities and principles, in Handbook of Learning and Approximate Dynamic Programming, J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley, 2004, pp. 3–44.
[104]
W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality. New York, NY, USA: Wiley, 2007.
[105]
P. J. Werbos, Using ADP to understand and replicate brain intelligence: The next level design, in Proc. IEEE Int. Symp. Approximate Dynamic Programming and Reinforcement Learning, Honolulu, HI, USA, 2007, pp. 209–216.
[107]
X. Bai, D. Zhao, and J. Yi, Coordinated multiple ramps metering based on neuro-fuzzy adaptive dynamic programming, in Proc. Int. Joint Conf. Neural Networks, Atlanta, GA, USA, 2009, pp. 241–248.
[108]
Y. Zhu, D. Zhao, and H. He, Integration of fuzzy controller with adaptive dynamic programming, in Proc. 10th World Congress on Intelligent Control and Automation, Beijing, China, 2012, pp. 310–315.
[110]
R. E. Saeks, C. J. Cox, K. Mathia, and A. J. Maren, Asymptotic dynamic programming: Preliminary concepts and results, in Proc. IEEE Int. Conf. Neural Networks, Houston, TX, USA, 1997, pp. 2273–2278.
[111]
S. Haykin, Neural Networks and Learning Machines, 3rd ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2009.
[112]
J. M. Zurada, Introduction to Artificial Neural Systems. St. Paul, MN, USA: West, 1992.
[113]
D. Liu, X. Xiong, and Y. Zhang, Action-dependent adaptive critic designs, in Proc. Int. Joint Conf. Neural Networks, Washington, DC, USA, 2001, pp. 990–995.
[114]
G. G. Lendaris and C. Paintz, Training strategies for critic and action neural networks in dual heuristic programming method, in Proc. IEEE Int. Conf. Neural Networks, Houston, TX, USA, 1997, pp. 712–717.
[189]
R. A. Santiago and P. Werbos, New progress towards truly brain-like intelligent control, in Proc. World Congress on Neural Networks, San Diego, CA, 1994, pp. 27–33.