A. Mandlekar, F. Ramos, B. Boots, S. Savarese, F. F. Li, A. Garg, and D. Fox, IRIS: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data, in Proc. 2020 IEEE Int. Conf. Robotics and Automation (ICRA), Paris, France, 2020, pp. 4414–4420.
X. Hao, Z. Peng, Y. Ma, G. Wang, J. Jin, J. Hao, S. Chen, R. Bai, M. Xie, M. Xu, et al., Dynamic knapsack optimization towards efficient multi-channel sequential advertising, in Proc. 37th Int. Conf. Machine Learning (ICML), virtual, 2020, pp. 4060–4070.
M. Zhou, J. Luo, J. Villella, Y. Yang, D. Rusu, J. Miao, W. Zhang, M. Alban, I. Fadakar, Z. Chen, et al., Smarts: An open-source scalable multi-agent RL training school for autonomous driving, in Proc. 4th Conf. Robot Learning (CoRL), Cambridge, MA, USA, 2020, pp. 264–285.
L. Wang, W. Zhang, X. He, and H. Zha, Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 2447–2456.
S. Levine, A. Kumar, G. Tucker, and J. Fu, Offline reinforcement learning: Tutorial, review, and perspectives on open problems, arXiv preprint arXiv: 2005.01643, 2020.
S. Fujimoto, D. Meger, and D. Precup, Off-policy deep reinforcement learning without exploration, in Proc. 36th Int. Conf. Machine Learning (ICML), Long Beach, CA, USA, 2019, pp. 2052–2062.
A. Kumar, J. Fu, M. Soh, G. Tucker, and S. Levine, Stabilizing off-policy Q-learning via bootstrapping error reduction, in Proc. 33rd Conf. Nerual Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019, pp. 11761–11771.
Y. Wu, G. Tucker, and O. Nachum, Behavior regularized offline reinforcement learning, arXiv preprint arXiv: 1911.11361, 2019.
N. Y. Siegel, J. T. Springenberg, F. Berkenkamp, A. Abdolmaleki, M. Neunert, T. Lampe, R. Hafner, and M. Riedmiller, Keep doing what worked: Behavioral modelling priors for offline reinforcement learning, arXiv preprint arXiv: 2002.08396, 2020.
A. Kumar, A. Zhou, G. Tucker, and S. Levine, Conservative Q-learning for offline reinforcement learning, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.
T. Yu, A. Kumar, R. Rafailov, A. Rajeswaran, S. Levine, and C. Finn, COMBO: Conservative offline model-based policy optimization, arXiv preprint arXiv: 2102.08363, 2021.
J. Li, C. Tang, M. Tomizuka, and W. Zhan, Dealing with the unknown: Pessimistic offline reinforcement learning, in Proc. 5th Conf. Robot Learning (CoRL), London, UK, 2021, pp. 1455–1464.
J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine, D4RL: Datasets for deep data-driven reinforcement learning, arXiv preprint arXiv: 2004.07219, 2020.
Y. Wu, S. Zhai, N. Srivastava, J. M. Susskind, J. Zhang, R. Salakhutdinov, and H. Goh, Uncertainty weighted actor-critic for offline reinforcement learning, in Proc. 38th Int. Conf. Machine Learning (ICML), virtual, 2021, pp. 11319–11328.
G. An, S. Moon, J. H. Kim, and H. O. Song, Uncertainty-based offline reinforcement learning with diversified Q-ensemble, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021.
S. Vernekar, A. Gaurav, V. Abdelzad, T. Denouden, R. Salay, and K. Czarnecki, Out-of-distribution detection in classifiers via generation, arXiv preprint arXiv: 1910.04241, 2019.
S. Fujimoto and S. S. Gu, A minimalist approach to offline reinforcement learning, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 20132–20145.
S. Verma, J. Fu, S. Yang, and S. Levine, CHAI: A CHatbot AI for task-oriented dialogue with offline reinforcement learning, in Proc. 2022 Conf. North American Chapter Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 2022, pp. 4471–4491.
N. Jaques, A. Ghandeharioun, J. Shen, C. Ferguson, À. Lapedriza, N. J. Jones, S. Gu, and R. W. Picard, Way off-policy batch deep reinforcement learning of implicit human preferences in dialog, arXiv preprint arXiv: 1907.00456, 2019.
W. Zhou, S. Bajracharya, and D. Held, PLAS: Latent action space for offline reinforcement learning, arXiv preprint arXiv: 2011.07213, 2020.
X. B. Peng, A. Kumar, G. Zhang, and S. Levine, Advantage-weighted regression: Simple and scalable off-policy reinforcement learning, arXiv preprint arXiv: 1910.00177, 2019.
R. S. Sutton, A. R. Mahmood, and M. White, An emphatic approach to the problem of off-policy temporal-difference learning, arXiv preprint arXiv: 1503.04269, 2015.
O. Nachum, B. Dai, I. Kostrikov, Y. Chow, L. Li, and D. Schuurmans, AlgaeDICE: Policy gradient from arbitrary experience, arXiv preprint arXiv: 1912.02074, 2019.
R. Agarwal, D. Schuurmans, and M. Norouzi, An optimistic perspective on offline reinforcement learning, in Proc. 37th Int. Conf. Machine Learning (ICML), virtual, 2020, pp. 104–114.
R. Kidambi, A. Rajeswaran, P. Netrapalli, and T. Joachims, MOReL: Model-based offline reinforcement learning, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.
T. Yu, G. Thomas, L. Yu, S. Ermon, J. Y. Zou, S. Levine, C. Finn, and T. Ma, MOPO: Model-based offline policy optimization, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.
T. Matsushima, H. Furuta, Y. Matsuo, O. Nachum, and S. Gu, Deployment-efficient reinforcement learning via model-based offline optimization, in Proc. 9th Int. Conf. Learning Representations (ICLR), virtual, 2021.
A. Argenson and G. Dulac-Arnold, Model based offline planning, in Proc. 9th Int. Conf. Learning Representations (ICLR), virtual, 2021.
J. Buckman, C. Gelada, and M. G. Bellemare, The importance of pessimism in fixed-dataset policyoptimization, in Proc. 9th Int. Conf. Learning Representations (ICLR), virtual, 2021.
T. Seno and M. Imai, d3rlpy: An offline deep reinforcement library, in Proc. Offline Reinforcement Learning Workshop at Neural Information Processing Systems, virtual, 2021.
Z. Wang, A. Novikov, K. Zolna, J. Merel, J. T. Springenberg, S. E. Reed, B. Shahriari, N. Y. Siegel, Ç. Gülçehre, N. Heess, et al., Critic regularized regression, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.
I. Kostrikov, A. Nair, and S. Levine, Offline reinforcement learning with implicit Q-learning, in Proc. Offline Reinforcement Learning Workshop at Neural Information Processing Systems, virtual, 2021.
L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, Decision transformer: Reinforcement learning via sequence modeling, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 15084–15097.
M. Janner, Q. Li, and S. Levine, Offline reinforcement learning as one big sequence modeling problem, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 1273–1286.
C. Bai, L. Wang, Z. Yang, Z. Deng, A. Garg, P. Liu, and Z. Wang, Pessimistic bootstrapping for uncertainty-driven offline reinforcement learning, presented at the 10th Int. Conf. Learning Representations (ICLR), virtual, 2022.