131
Views
54
Downloads
0
Crossref
0
WoS
0
Scopus
0
CSCD
Reinforcement Learning (RL) has emerged as a promising data-driven solution for wargaming decisionmaking. However, two domain challenges still exist: (1) dealing with discrete-continuous hybrid wargaming control and (2) accelerating RL deployment with rich offline data. Existing RL methods fail to handle these two issues simultaneously, thereby we propose a novel offline RL method targeting hybrid action space. A new constrained action representation technique is developed to build a bidirectional mapping between the original hybrid action space and a latent space in a semantically consistent way. This allows learning a continuous latent policy with offline RL with better exploration feasibility and scalability and reconstructing it back to a needed hybrid policy. Critically, a novel offline RL optimization objective with adaptively adjusted constraints is designed to balance the alleviation and generalization of out-of-distribution actions. Our method demonstrates superior performance and generality across different tasks, particularly in typical realistic wargaming scenarios.
© The author(s) 2024.
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).