[1]
A. Mandlekar, F. Ramos, B. Boots, S. Savarese, F. F. Li, A. Garg, and D. Fox, IRIS: Implicit reinforcement without interaction at scale for learning control from offline robot manipulation data, in Proc. 2020 IEEE Int. Conf. Robotics and Automation (ICRA), Paris, France, 2020, pp. 4414–4420.
[2]
X. Hao, Z. Peng, Y. Ma, G. Wang, J. Jin, J. Hao, S. Chen, R. Bai, M. Xie, M. Xu, et al., Dynamic knapsack optimization towards efficient multi-channel sequential advertising, in Proc. 37th Int. Conf. Machine Learning (ICML), virtual, 2020, pp. 4060–4070.
[4]
M. Zhou, J. Luo, J. Villella, Y. Yang, D. Rusu, J. Miao, W. Zhang, M. Alban, I. Fadakar, Z. Chen, et al., Smarts: An open-source scalable multi-agent RL training school for autonomous driving, in Proc. 4th Conf. Robot Learning (CoRL), Cambridge, MA, USA, 2020, pp. 264–285.
[5]
L. Wang, W. Zhang, X. He, and H. Zha, Supervised reinforcement learning with recurrent neural network for dynamic treatment recommendation, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 2447–2456.
[6]
S. Levine, A. Kumar, G. Tucker, and J. Fu, Offline reinforcement learning: Tutorial, review, and perspectives on open problems, arXiv preprint arXiv: 2005.01643, 2020.
[7]
S. Fujimoto, D. Meger, and D. Precup, Off-policy deep reinforcement learning without exploration, in Proc. 36th Int. Conf. Machine Learning (ICML), Long Beach, CA, USA, 2019, pp. 2052–2062.
[8]
A. Kumar, J. Fu, M. Soh, G. Tucker, and S. Levine, Stabilizing off-policy Q-learning via bootstrapping error reduction, in Proc. 33rd Conf. Nerual Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019, pp. 11761–11771.
[9]
Y. Wu, G. Tucker, and O. Nachum, Behavior regularized offline reinforcement learning, arXiv preprint arXiv: 1911.11361, 2019.
[10]
N. Y. Siegel, J. T. Springenberg, F. Berkenkamp, A. Abdolmaleki, M. Neunert, T. Lampe, R. Hafner, and M. Riedmiller, Keep doing what worked: Behavioral modelling priors for offline reinforcement learning, arXiv preprint arXiv: 2002.08396, 2020.
[11]
A. Kumar, A. Zhou, G. Tucker, and S. Levine, Conservative Q-learning for offline reinforcement learning, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.
[12]
T. Yu, A. Kumar, R. Rafailov, A. Rajeswaran, S. Levine, and C. Finn, COMBO: Conservative offline model-based policy optimization, arXiv preprint arXiv: 2102.08363, 2021.
[13]
J. Li, C. Tang, M. Tomizuka, and W. Zhan, Dealing with the unknown: Pessimistic offline reinforcement learning, in Proc. 5th Conf. Robot Learning (CoRL), London, UK, 2021, pp. 1455–1464.
[14]
J. Fu, A. Kumar, O. Nachum, G. Tucker, and S. Levine, D4RL: Datasets for deep data-driven reinforcement learning, arXiv preprint arXiv: 2004.07219, 2020.
[15]
Y. Wu, S. Zhai, N. Srivastava, J. M. Susskind, J. Zhang, R. Salakhutdinov, and H. Goh, Uncertainty weighted actor-critic for offline reinforcement learning, in Proc. 38th Int. Conf. Machine Learning (ICML), virtual, 2021, pp. 11319–11328.
[16]
G. An, S. Moon, J. H. Kim, and H. O. Song, Uncertainty-based offline reinforcement learning with diversified Q-ensemble, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021.
[17]
S. Vernekar, A. Gaurav, V. Abdelzad, T. Denouden, R. Salay, and K. Czarnecki, Out-of-distribution detection in classifiers via generation, arXiv preprint arXiv: 1910.04241, 2019.
[18]
S. Fujimoto and S. S. Gu, A minimalist approach to offline reinforcement learning, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 20132–20145.
[19]
S. Verma, J. Fu, S. Yang, and S. Levine, CHAI: A CHatbot AI for task-oriented dialogue with offline reinforcement learning, in Proc. 2022 Conf. North American Chapter Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 2022, pp. 4471–4491.
[20]
N. Jaques, A. Ghandeharioun, J. Shen, C. Ferguson, À. Lapedriza, N. J. Jones, S. Gu, and R. W. Picard, Way off-policy batch deep reinforcement learning of implicit human preferences in dialog, arXiv preprint arXiv: 1907.00456, 2019.
[21]
W. Zhou, S. Bajracharya, and D. Held, PLAS: Latent action space for offline reinforcement learning, arXiv preprint arXiv: 2011.07213, 2020.
[22]
X. B. Peng, A. Kumar, G. Zhang, and S. Levine, Advantage-weighted regression: Simple and scalable off-policy reinforcement learning, arXiv preprint arXiv: 1910.00177, 2019.
[23]
R. S. Sutton, A. R. Mahmood, and M. White, An emphatic approach to the problem of off-policy temporal-difference learning, arXiv preprint arXiv: 1503.04269, 2015.
[24]
O. Nachum, B. Dai, I. Kostrikov, Y. Chow, L. Li, and D. Schuurmans, AlgaeDICE: Policy gradient from arbitrary experience, arXiv preprint arXiv: 1912.02074, 2019.
[25]
R. Agarwal, D. Schuurmans, and M. Norouzi, An optimistic perspective on offline reinforcement learning, in Proc. 37th Int. Conf. Machine Learning (ICML), virtual, 2020, pp. 104–114.
[26]
R. Kidambi, A. Rajeswaran, P. Netrapalli, and T. Joachims, MOReL: Model-based offline reinforcement learning, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.
[27]
T. Yu, G. Thomas, L. Yu, S. Ermon, J. Y. Zou, S. Levine, C. Finn, and T. Ma, MOPO: Model-based offline policy optimization, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.
[28]
T. Matsushima, H. Furuta, Y. Matsuo, O. Nachum, and S. Gu, Deployment-efficient reinforcement learning via model-based offline optimization, in Proc. 9th Int. Conf. Learning Representations (ICLR), virtual, 2021.
[29]
A. Argenson and G. Dulac-Arnold, Model based offline planning, in Proc. 9th Int. Conf. Learning Representations (ICLR), virtual, 2021.
[30]
J. Buckman, C. Gelada, and M. G. Bellemare, The importance of pessimism in fixed-dataset policyoptimization, in Proc. 9th Int. Conf. Learning Representations (ICLR), virtual, 2021.
[31]
T. Seno and M. Imai, d3rlpy: An offline deep reinforcement library, in Proc. Offline Reinforcement Learning Workshop at Neural Information Processing Systems, virtual, 2021.
[32]
Z. Wang, A. Novikov, K. Zolna, J. Merel, J. T. Springenberg, S. E. Reed, B. Shahriari, N. Y. Siegel, Ç. Gülçehre, N. Heess, et al., Critic regularized regression, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.
[33]
I. Kostrikov, A. Nair, and S. Levine, Offline reinforcement learning with implicit Q-learning, in Proc. Offline Reinforcement Learning Workshop at Neural Information Processing Systems, virtual, 2021.
[34]
L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, Decision transformer: Reinforcement learning via sequence modeling, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 15084–15097.
[35]
M. Janner, Q. Li, and S. Levine, Offline reinforcement learning as one big sequence modeling problem, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 1273–1286.
[36]
C. Bai, L. Wang, Z. Yang, Z. Deng, A. Garg, P. Liu, and Z. Wang, Pessimistic bootstrapping for uncertainty-driven offline reinforcement learning, presented at the 10th Int. Conf. Learning Representations (ICLR), virtual, 2022.