CAAI Artificial Intelligence Research 2023, 2: 9150026 https://doi.org/10.26599/AIR.2023.9150026

Article |

Open Access | Issue | Published: 10 January 2024

On Realization of Intelligent Decision Making in the Real World: A Foundation Decision Model Perspective

Show Author's Information Hide Author's Information Ying Wen^¹, Ziyu Wan^¹, Ming Zhou^¹, Shufang Hou^², Zhe Cao^¹, Chenyang Le^¹, Jingxiao Chen^¹, Zheng Tian^³, Weinan Zhang^¹(

), Jun Wang^{²^,⁴}

1SEIEE, Shanghai Jiao Tong University, Shanghai 200240, China

2Digital Brain Laboratory, Shanghai 201306, China

3School of Creativity and Art, ShanghaiTech University, Shanghai 201210, China

4Department of Computer Science, University College London, London WC1E 6BT, UK

Keywords:

artificial intelligence, Transformer, intelligent decision making, foundation decision model

Cite this article:

Wen Y, Wan Z, Zhou M, et al. On Realization of Intelligent Decision Making in the Real World: A Foundation Decision Model Perspective. CAAI Artificial Intelligence Research, 2023, 2: 9150026. https://doi.org/10.26599/AIR.2023.9150026

Download citation

EndNote(RIS)

BibTeX

402

Views

Downloads

Citations

Crossref

N/A

WoS

N/A

Scopus

N/A

CSCD

Abstract Full text About this article

Abstract

The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications. The advancement of Artificial General Intelligence (AGI) that transcends task and application boundaries is critical for enhancing IDM. Recent studies have extensively investigated the Transformer neural architecture as a foundational model for various tasks, including computer vision, natural language processing, and reinforcement learning. We propose that a Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks using the Transformer architecture, offering a promising solution for expanding IDM applications in complex real-world situations. In this paper, we discuss the efficiency and generalization improvements offered by a foundation decision model for IDM and explore its potential applications in multi-agent game AI, production scheduling, and robotics tasks. Lastly, we present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks, such as text generation, image captioning, video game playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 represents an initial step toward more autonomous and efficient real-world IDM applications.

Full text

Abstract

Full text

Outline

About this article

On Realization of Intelligent Decision Making in the Real World: A Foundation Decision Model Perspective

Show Author's information Hide Author's Information Ying Wen^¹, Ziyu Wan^¹, Ming Zhou^¹, Shufang Hou^², Zhe Cao^¹, Chenyang Le^¹, Jingxiao Chen^¹, Zheng Tian^³, Weinan Zhang^¹(

), Jun Wang^{²^,⁴}

1SEIEE, Shanghai Jiao Tong University, Shanghai 200240, China

2Digital Brain Laboratory, Shanghai 201306, China

3School of Creativity and Art, ShanghaiTech University, Shanghai 201210, China

4Department of Computer Science, University College London, London WC1E 6BT, UK

Abstract

Keywords: artificial intelligence, Transformer, intelligent decision making, foundation decision model

References(101)

[1]

N. Khatri and H. A. Ng, The role of intuition in strategic decision making, Hum. Relat., vol. 53, no. 1, pp. 57–86, 2000.

DOI Google Scholar

[2]

R. S. Peres, X. Jia, J. Lee, K. Sun, A. W. Colombo, and J. Barata, Industrial artificial intelligence in industry 4.0 - systematic review, challenges and outlook, IEEE Access, vol. 8, pp. 220121–220139, 2020.

DOI Google Scholar

[3]

H. A. Simon, Bounded rationality, in Utility and Probability, J. Eatwell, M. Milgate, P. Newman Eds. London, UK: Palgrave Macmillan, 1990, pp. 15–18.

DOI

[4]

D. E. Kirk, Optimal control theory: An introduction, Mineola, NY, USA: Dover Publications, 2004.

[5]

W. L. Winston and J. B. Goldberg, Operations research: Applications and algorithms, Boston, MA, USA: Thomson Brooks/Cole, 1998.

[6]

S. Eom and E. Kim, A survey of decision support system applications (1995–2001), J. Oper. Res. Soc., vol. 57, no. 11, pp. 1264–1278, 2006.

DOI Google Scholar

[7]

J. W. Herrmann, Handbook of production scheduling, New York, NY, USA: Springer, 2006.

DOI

[8]

M. S. Nolan, Fundamentals of air traffic control, Clifton Park, NY, USA: Delmar Cengage Learning, 2010.

[9]

G. O. Barnett, J. J. Cimino, J. A. Hupp, and E. P. Hoffer, DXplain. An evolving diagnostic decision-support system, JAMA, vol. 258, no. 1, pp. 67–74, 1987.

DOI Google Scholar

[10]

J. Ranjan, Business intelligence: Concepts, components, techniques and benefits, J. Theor. Appl. Inf. Technol., vol. 9, no. 1, pp. 60–70, 2009.

Google Scholar

[11]

T. Gudehus and H. Kotzab, Comprehensive logistics, Heidelberg, Germany: Springer, 2012.

DOI

[12]

M. B. Jensen, M. P. Philipsen, A. Mogelmose, T. B. Moeslund, and M. M. Trivedi, Vision for looking at traffic lights: Issues, survey, and perspectives, IEEE Trans. Intell. Transport. Syst., vol. 17, no. 7, pp. 1800–1815, 2016.

DOI Google Scholar

[13]

G. Phillips-Wren, N. Ichalkaranje, and L. C. Jain, Intelligent decision making: An AI-based approach, Heidelberg, Germany: Springer-Verlag, 2008.

DOI

[14]

C. Gupta and A. Farahat, Deep learning for industrial AI: Challenges, new methods and best practices, in Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, virtual, 2020, pp. 3571–3572.

DOI

[15]

J. Wang, W. Zhang, and S. Yuan, Display advertising with real-time bidding (RTB) and behavioural targeting, Found. Trends® Inf. Retr., vol. 11, no. 4–5, pp. 297–435, 2017.

DOI Google Scholar

[16]

R. Salakhutdinov, Deep learning, in Proc. 20th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, New York, NY, USA, 2014, p. 1973.

DOI

[17]

S. Russell and P. Norvig, Artificial intelligence: A modern approach, Upper Saddle River, NJ, USA: Pearson, 2010.

[18]

R. R. Murphy, Introduction to AI robotics, Ind. Robot Int. J., vol. 28, no. 3, pp. 266–267, 2001.

DOI Google Scholar

[19]

J. Chen, J. Sun, and G. Wang, From unmanned systems to autonomous intelligent systems, Engineering, vol. 12, pp. 16–19, 2022.

DOI Google Scholar

[20]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing atari with deep reinforcement learning, arXiv preprint arXiv: 1312.5602, 2013.

[21]

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Cambridge, MA, USA: MIT Press, 2018.

[22]

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., Mastering the game of Go without human knowledge, Nature, vol. 550, no. 7676, pp. 354–359, 2017.

DOI Google Scholar

[23]

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, vol. 575, no. 7782, pp. 350–354, 2019.

DOI Google Scholar

[24]

R. Sutton, The bitter lesson, http://www.incompleteideas.net/IncIdeas/BitterLesson.html, 2019.

[25]

R. Qin, S. Gao, X. Zhang, Z. Xu, S. Huang, Z. Li, W. Zhang, and Y. Yu, Neorl: A near real-world benchmark for offline reinforcement learning, arXiv preprint arXiv: 2102.00714, 2021.

[26]

I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3104–3112.

[27]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.

DOI Google Scholar

[28]

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 4171–4186.

[29]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.

[30]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin Transformer: Hierarchical Vision Transformer using shifted windows, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9992–10002.

DOI

[31]

L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, Decision Transformer: Reinforcement learning via sequence modeling, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 15084–15097.

[32]

M. Janner, Q. Li, and S. Levine, Offline reinforcement learning as one big sequence modeling problem, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 1273–1286.

[33]

B. Baker, I. Akkaya, P. Zhokhov, J. Huizinga, J. Tang, A. Ecoffet, B. Houghton, R. Sampedro, and J. Clune, Video PreTraining (VPT): Learning to act by watching unlabeled online videos, arXiv preprint arXiv: 2206.11795, 2022.

[34]

S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J. T. Springenberg, et al., A generalist agent, arXiv preprint arXiv: 2205.06175, 2022.

[35]

K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu et al., A survey on visual transformer, arXiv preprint arXiv: 2012.12556, 2020.

[36]

P. Xu, X. Zhu, and D. A. Clifton, Multimodal learning with transformers: A survey, IEEE Trans. Pattern Anal. Mach. Intell., pp. 1–20, 2023.

DOI

[37]

Y. -H H. Tsai, S. Bai, P. Pu Liang, J. Z. Kolter, L. -P. Morency, and R. Salakhutdinov, Multimodal transformer for unaligned multimodal language sequences, Proc. Conf. Assoc. Comput. Linguist. Meet., vol. 2019, pp. 6558–6569, 2019.

DOI

[38]

Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun, Transformers in time series: A survey, arXiv preprint arXiv: 2202.07125, 2022.

DOI

[39]

R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al., On the opportunities and risks of foundation models, arXiv preprint arXiv: 2108.07258, 2021.

[40]

H. Bao, L. Dong, and F. Wei, BEiT: Bert pre-training of image Transformers, arXiv preprint arXiv: 2106.08254, 2021.

[41]

W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal, O. K. Mohammed, S. Singhal, S. Som, et al., Image as a foreign language: BEiT pretraining for all vision and vision-language tasks, arXiv preprint arXiv: 2208.10442, 2022.

DOI

[42]

Z. Lin, M. Feng, C. N. dos Santos, M. Yu, B. Xiang, B. Zhou, and Y. Bengio, A structured self-attentive sentence embedding, in Proc. 5th Int. C. Learning Representations, Toulon, France, 2017.

[43]

M. Wen, J. G. Kuba, R. Lin, W. Zhang, Y. Wen, J. Wang, and Y. Yang, Multi-agent reinforcement learning is a sequence modeling problem, in Proc. 36th Conf. Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 2022.

[44]

C. A. Seger and E. J. Peterson, Categorization = decision making + generalization, Neurosci. Biobehav. Rev., vol. 37, no. 7, pp. 1187–1200, 2013.

DOI Google Scholar

[45]

R. Kirk, A. Zhang, E. Grefenstette, and T. Rocktäschel, A survey of zero-shot generalisation in deep reinforcement learning, J. Artif. Intell. Res., vol. 76, pp. 201–264, 2023.

DOI Google Scholar

[46]

O. Poquet and M. de Laat, Developing capabilities: Lifelong learning in the age of AI, Br. J. Educ. Technol., vol. 52, no. 4, pp. 1695–1708, 2021.

DOI Google Scholar

[47]

L. Kaiser, A. N. Gomez, N Shazeer, A Vaswani, N Parmar, L Jones, and J Uszkoreit, One model to learn them all, arXiv preprint arXiv: 1706.05137, 2017.

[48]

J. Schmidhuber, One big net for everything, arXiv preprint arXiv: 1802.08864, 2018.

[49]

C. Gan, Y. Zhang, J. Wu, B. Gong, and J. B. Tenenbaum, Look, listen, and act: Towards audio-visual embodied navigation, in Proc. 2020 IEEE Int. Conf. Robotics and Automation (ICRA), Paris, France, 2020, pp. 9701–9707.

DOI

[50]

M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, et al., Do as i can, not as i say: Grounding language in robotic affordances, arXiv preprint arXiv 2204.01691, 2022.

[51]

D. Perez-Liebana, K. Hofmann, S. P. Mohanty, N. Kuno, A. Kramer, S. Devlin, R. D. Gaina, and D. Ionita, The multi-agent reinforcement learning in malm\ “O (MARL\” O) competition, arXiv preprint arXiv: 1901.08129, 2019.

[52]

L. Fan, G. Wang, Y. Jiang, A. Mandlekar, Y. Yang, H. Zhu, A. Tang, D. -A. Huang, Y. Zhu, and A. Anandkumar, MineDojo: building open-ended embodied agents with internet-scale knowledge, arXiv preprint arXiv: 2206.08853, 2022.

[53]

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 8748–8763.

[54]

S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans, D. Dai, and L. Van Gool, Multi-task learning for dense prediction tasks: A survey, IEEE Trans. Pattern Anal. Mach. Intell., p. 1, 2021.

DOI

[55]

Q. Fu, Z. Wang, N. Fang, B. Xing, X. Zhang, and J. Chen, MAML^2: meta reinforcement learning via meta-learning for task categories, Front. Comput. Sci., vol. 17, no. 4, pp. 1–11, 2022.

DOI Google Scholar

[56]

Y. Zhang and Q. Yang, A survey on multi-task learning, IEEE Trans. Knowl. Data Eng., vol. 34, no. 12, pp. 5586–5609, 2022.

DOI Google Scholar

[57]

B. Liu, Lifelong machine learning: a paradigm for continuous learning, Front. Comput. Sci., vol. 11, no. 3, pp. 359–361, 2017.

DOI Google Scholar

[58]

A. P. Badia, B. Piot, S. Kapturowski, P. Sprechmann, A. Vitvitskyi, Z. D. Guo, and C. Blundell, Agent57: Outperforming the Atari human benchmark, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 507–517.

[59]

L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, et al., IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 1406–1415.

[60]

W. Zhao, J. P. Queralta, and T. Westerlund, Sim-to-real transfer in deep reinforcement learning for robotics: a survey, in Proc. 2020 IEEE Symp. Series on Computational Intelligence (SSCI), Canberra, Australia, pp. 737–744.

DOI

[61]

E. Salvato, G. Fenu, E. Medvet, and F. A. Pellegrino, Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning, IEEE Access, vol. 9, pp. 153171–153187, 2021.

DOI Google Scholar

[62]

B. Zheng, S. Verma, J. Zhou, I. Tsang, and F. Chen, Imitation learning: Progress, taxonomies and opportunities, arXiv preprint arXiv: 2106.12177, 2021.

[63]

J. Ho and S. Ermon, Generative adversarial imitation learning, in Proc. 30th Conf. Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 2016, pp. 4565–4573.

[64]

X. Zhang, H. Ma, X. Luo, and J. Yuan, LIDAR: learning from imperfect demonstrations with advantage rectification, Front. Comput. Sci., vol. 16, no. 1, pp. 1–10, 2021.

DOI Google Scholar

[65]

J. Hua, L. Zeng, G. Li, and Z. Ju, Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning, Sensors, vol. 21, no. 4, p. 1278, 2021.

DOI Google Scholar

[66]

S. Gronauer and K. Diepold, Multi-agent deep reinforcement learning: a survey, Artif. Intell. Rev., vol. 55, no. 2, pp. 895–943, 2022.

DOI Google Scholar

[67]

L. Meng, M. Wen, Y. Yang, C. Le, X. Li, W. Zhang, Y. Wen, H. Zhang, J. Wang, and B. Xu, Offline pre-trained multi-agent decision Transformer, Mach. Intell. Res., vol. 20, no. 2, pp. 233–248, 2023.

DOI Google Scholar

[68]

J. G. Kuba, R. Chen, M. Wen, Y. Wen, F. Sun, J. Wang, and Y. Yang, Trust region policy optimisation in multi-agent reinforcement learning, in Proc. 10th Int. Conf. Learning Representations, virtual, 2022.

[69]

M. Hausknecht and P. Stone, Deep recurrent Q-learning for partially observable MDPs, in Proc. 2015 AAAI Fall Symp., Arlington, VA, USA, 2015, pp. 29–37.

[70]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol. 529, no. 7587, pp. 484–489, 2016.

DOI Google Scholar

[71]

N. Brown and T. Sandholm, Libratus: the superhuman AI for No-limit poker, in Proc. 26th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 5226–5228.

DOI

[72]

OpenAI, C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer et al., Dota 2 with large scale deep reinforcement learning, arXiv preprint arXiv: 1912.06680, 2019.

[73]

Q. Yin, J. Yang, W. Ni, B. Liang, and K. Huang, AI in games: Techniques, challenges and opportunities, arXiv preprint arXiv: 2111.07631, 2021.

[74]

Z. Jiang, S. Yuan, J. Ma, and Q. Wang, The evolution of production scheduling from Industry 3.0 through Industry 4.0, Int. J. Prod. Res., vol. 60, no. 11, pp. 3534–3554, 2022.

DOI Google Scholar

[75]

C. Waubert de Puiseau, R. Meyes, and T. Meisen, On reliability of reinforcement learning based production scheduling systems: a comparative survey, J. Intell. Manuf., vol. 33, no. 4, pp. 911–927, 2022.

DOI Google Scholar

[76]

F. Bonin-Font, A. Ortiz, and G. Oliver, Visual navigation for mobile robots: A survey, J. Intell. Rob. Syst., vol. 53, no. 3, pp. 263–296, 2008.

DOI Google Scholar

[77]

B. Singh, R. Kumar, and V. P. Singh, Reinforcement learning in robotic applications: a comprehensive survey, Artif. Intell. Rev., vol. 55, no. 2, pp. 945–990, 2022.

DOI Google Scholar

[78]

C. Beattie, J. Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küttler, A. Lefrancq, S. Green, V. Valdés, A. Sadik, et al, Deepmind lab, AarXiv preprint arXiv: 1612.03801, 2016.

[79]

S. Schmitt, M. Hessel, and K. Simonyan, Off-policy actor-critic with shared experience replay, in Proc. 37th Int. Conf. Machine Learning, 2020, virtual, pp. 8545–8554.

[80]

K. Cobbe, C. Hesse, J. Hilton, and J. Schulman, Leveraging procedural generation to benchmark reinforcement learning, in Proc. 37th Int. Conf. Machine Learning, 2020, virtual, pp. 2048–2056.

[81]

S. Racanière, T. Weber, D. P. Reichert, L. Buesing, A. Guez, D. J. Rezende, A. P. Badia, O. Vinyals, N. Heess, Y. Li, et al., Imagination-augmented agents for deep reinforcement learning, in Proc. 31st Conf. Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017, pp. 5690–5701.

[82]

M. Chevalier-Boisvert, D. Bahdanau, S. Lahlou, L. Willems, C. Saharia, T. H. Nguyen, and Y. Bengio, BabyAI: A platform to study the sample efficiency of grounded language learning, in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[83]

W. Huang, I. Mordatch, and D. Pathak, One policy to control them all: Shared modular policies for agent-agnostic control, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 4455–4464.

[84]

Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.

DOI

[85]

G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, OpenAI Gym, arXiv preprint arXiv: 1606.01540, 2016.

[86]

G. Barth-Maron, M. W. Hoffman, D. Budden, W. Dabney, D. Horgan, D. TB, A. Muldal, N. Heess, and T. P. Lillicrap, Distributed distributional deterministic policy gradients, presented at the 6th Int. Conf. Learning Representations, Vancouver, Canada, 2018.

[87]

T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning, in Proc. 3rd Annu. Conf. Robot Learning, Osaka, Japan, 2019, pp. 1094–1100.

[88]

A. Petrenko, Z. Huang, T. Kumar, G. Sukhatme, and V. Koltun, Sample factory: Egocentric 3D control from pixels at 100000 FPS with asynchronous reinforcement learning, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 7652–7662.

[89]

S. Tunyasuvunakool, A. Muldal, Y. Doron, S. Liu, S. Bohez, J. Merel, T. Erez, T. Lillicrap, N. Heess, and Y. Tassa, Dm_control: Software and tasks for continuous control, Softw. Impacts, vol. 6, p. 100022, 2020.

DOI Google Scholar

[90]

X. Chen, H. Fang, T. Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick, Microsoft COCO captions: Data collection and evaluation server, arXiv preprint arXiv: 1504.00325, 2015.

[91]

B. Gavish and S. Graves, The travelling salesman problem and related problems, http://hdl.handle.net/1721.1/5363, 1978.

[92]

S. Lin and B. W. Kernighan, An effective heuristic algorithm for the traveling-salesman problem, Oper. Res., vol. 21, no. 2, pp. 498–516, 1973.

DOI Google Scholar

[93]

T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, presented at the 5th Int. Conf. Learning Representations, Toulon, France, 2017.

[94]

Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. Le, and R. Salakhutdinov, Transformer-XL: Attentive language models beyond a fixed-length context, in Proc. 57th Annu. Meeting Association for Computational Linguistics, Florence, Italy, 2019, pp. 2978–2988.

DOI

[95]

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, https://paperswithcode.com/paper/language-models-are-unsupervised-multitask, 2019.

[96]

R. Xiong, Y. Yang, D. He, K. Zheng, S. Zheng, C. Xing, H. Zhang, Y. Lan, L. Wang, et al., On layer normalization in the Transformer architecture, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 10524–10533.

[97]

N. Shazeer, GLU variants improve Transformer, arXiv preprint arXiv: 2002.05202, 2020.

[98]

R. J. Williams and D. Zipser, A learning algorithm for continually running fully recurrent neural networks, Neural Comput., vol. 1, no. 2, pp. 270–280, 1989.

DOI Google Scholar

[99]

I. Loshchilov and F. Hutter. Decoupled weight decay regularization, presented at the 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.

[100]

P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing, ACM Comput. Surv., vol. 55, no. 9, p. 195,

DOI

[101]

M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, The arcade learning environment: An evaluation platform for general agents, in Proc. 24th Int. Conf. Artificial Intelligence, Buenos Aires, Argentina, 2015, pp. 4148–4152.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 02 July 2023

Revised: 21 September 2023

Accepted: 03 November 2023

Published: 10 January 2024

Issue date: December 2023

Copyright

Acknowledgements

Acknowledgment

This work was completed when Z. Wan, M. Zhou, Z. Cao, C. Le, and J. Chen were interns at Digital Brain Laboratory.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).