[1]
R. R. Hill and J. O. Miller, A history of United States military simulation, in Proc. 2017 Winter Simulation Conf. (WSC), Las Vegas, NV, USA, 2017, pp. 346–364.
[2]
J. Appleget, An introduction to wargaming and modeling and simulation, in Simulation and Wargaming, C. Turnitsa, C. Blais, and A. Tolk, Eds. Hoboken, NJ, USA: John Wiley & Sons, 2021, pp. 1−22.
[3]
S. Wang and Y. Liu, Modeling and simulation of CGF aerial targets for simulation training, in Proc. Int. Conf. Computer Intelligent Systems and Network Remote Control (CISNRC 2020). doi: 10.12783/dtcse/cisnr2020/35167 .
[15]
B. Yuksek, U. M. Demirezen, and G. Inalhan, Development of UCAV fleet autonomy by reinforcement learning in a wargame simulation environment, in Proc. AIAA Scitech 2021 Forum. doi: 10.2514/6.2021-0175 .
[17]
S. Fujimoto, D. Meger, and D. Precup, Off-policy deep reinforcement learning without exploration, in Proc. 36 th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 2052–2062.
[18]
W. Zhou, S. Bajracharya, and D. Held, PLAS: Latent action space for offline reinforcement learning, in Proc. 2020 4 th Conf. Robot Learning, Cambridge, MA, USA, 2021, pp. 1719–1735.
[19]
S. Fujimoto and S. Gu, A minimalist approach to offline reinforcement learning, in Proc. 34 th Int. Conf. Neural Information Processing Systems, Virtual Event, 2021, pp. 20132–20145.
[20]
A. Kumar, A. Zhou, G. Tucker, and S. Levine, Conservative Q-learning for offline reinforcement learning, in Proc. 34 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, p. 100.
[21]
C. Zhao, K. Huang, and C. Yuan, DCE: Offline reinforcement learning with double conservative estimates, in Proc. 11 th Int. Conf. Learning Representations.
[22]
Y. Wu, G. Tucker, and O. Nachum, Behavior regularized offline reinforcement learning, arXiv preprint arXiv: 1911.11361, 2019.
[23]
T. Yu, G. Thomas, L. Yu, S. Ermon, J. Zou, S. Levine, C. Finn, and T. Ma, MOPO: Model-based offline policy optimization, in Proc. 34 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, p. 1185.
[24]
R. Kidambi, A. Rajeswaran, P. Netrapalli, and T. Joachims, MOReL: Model-based offline reinforcement learning, in Proc. 34 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, p. 1830.
[25]
B. Li, H. Tang, Y. Zheng, J. Hao, P. Li, Z. Wang, Z. Meng, and L. Wang, HyAR: Addressing discrete-continuous action reinforcement learning via hybrid action representation, in Proc. the 10 th Int. Conf. Learning Representations, Virtual Event.
[26]
J. Xiong, Q. Wang, Z. Yang, P. Sun, L. Han, Y. Zheng, H. Fu, T. Zhang, J. Liu, and H. Liu, Parametrized deep Q-networks learning: Reinforcement learning with discrete-continuous hybrid action space, arXiv preprint arXiv: 1810.06394, 2018.
[27]
W. Masson, P. Ranchod, and G. Konidaris, Reinforcement learning with parameterized actions, in Proc. 10 th AAAI Conf. Artificial Intelligence, Phoenix, AZ, USA, 2016, pp. 1934–1940.
[28]
C. J. Bester, S. D. James, and G. D. Konidaris, Multi-pass Q-networks for deep reinforcement learning with parameterised action spaces, arXiv preprint arXiv: 1905.04388, 2019.
[29]
M. Hausknecht and P. Stone, Deep reinforcement learning in parameterized action space, in Proc. the 4 th Int. Conf. Learning Representations, San Juan, PR, USA.
[30]
Z. Fan, R. Su, W. Zhang, and Y. Yu, Hybrid actor-critic reinforcement learning in parameterized action space, in Proc. 28 th Int. Joint Conf. Artificial Intelligence, Macao, China, 2019, pp. 2279–2285.
[32]
S. Levine, A. Kumar, G. Tucker, and J. Fu, Offline reinforcement learning: Tutorial, review, and perspectives on open problems, arXiv preprint arXiv: 2005.01643, 2020.
[33]
R. Agarwal, D. Schuurmans, and M. Norouzi, An optimistic perspective on offline reinforcement learning, in Proc. 37 th Int. Conf. Machine Learning, Virtual Event, 2020, pp. 104–114.
[34]
Y. Guo, S. Feng, N. Le Roux, E. Chi, H. Lee, and M. Chen, Batch reinforcement learning through continuation method, in Proc. the 9 th Int. Conf. Learning Representations, Virtual Event, https://openreview.net/forum?id=po-DLlBuAuz, 2021.
[36]
D. P. Kingma and M. Welling, Auto-encoding variational Bayes, arXiv preprint arXiv: 1312.6114, 2022.
[37]
W. Whitney, R. Agarwal, K. Cho, and A. Gupta, Dynamics-aware embeddings, in Proc. the 8 th Int. Conf. Learning Representations, Addis Ababa, Ethiopia.
[39]
M. Masek, C. P. Lam, L. Benke, L. Kelly, and M. Papasimeon, Discovering emergent agent behaviour with evolutionary finite state machines, in Proc. 21 st Int. Conf. PRIMA 2018 : Principles and Practice of Multi-Agent Systems, Tokyo, Japan, 2018, pp. 19–34.
[40]
R. S. Sutton and A. G. Barto, Reinforcement Learning : An Introduction, 2nd ed. Cambridge, MA, USA: The MIT Press, 2018.
[41]
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv: 1509.02971, 2019.
[42]
D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, Deterministic policy gradient algorithms, in Proc. 31 st Int. Conf. Machine Learning, Beijing, China, 2014, pp. 387–395.
[43]
S. Fujimoto, H. van Hoof, and D. Meger, Addressing function approximation error in actor-critic methods, in Proc. 35 th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 1587–31596.
[44]
C. Guo and F. Berkhahn, Entity embeddings of categorical variables, arXiv preprint arXiv: 1604.06737, 2016.
[45]
A. Grosnit, R. Tutunov, A. M. Maraval, R. R. Griffiths, A. I. Cowen-Rivers, L. Yang, L. Zhu, W. Lyu, Z. Chen, J. Wang, et al., High-dimensional Bayesian optimisation with variational autoencoders and deep metric learning, arXiv preprint arXiv: 2106.03609, 2021.
[46]
M. Schwarzer, N. Rajkumar, M. Noukhovitch, A. Anand, L. Charlin, R. D. Hjelm, P. Bachman, and A. C. Courville, Pretraining representations for data-efficient reinforcement learning, in Proc. 34 th Int. Conf. Neural Information Processing Systems, Virtual Event, 2021, pp. 12686–12699.
[47]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2017.