Journal Home > Volume 2

The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications. The advancement of Artificial General Intelligence (AGI) that transcends task and application boundaries is critical for enhancing IDM. Recent studies have extensively investigated the Transformer neural architecture as a foundational model for various tasks, including computer vision, natural language processing, and reinforcement learning. We propose that a Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks using the Transformer architecture, offering a promising solution for expanding IDM applications in complex real-world situations. In this paper, we discuss the efficiency and generalization improvements offered by a foundation decision model for IDM and explore its potential applications in multi-agent game AI, production scheduling, and robotics tasks. Lastly, we present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks, such as text generation, image captioning, video game playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 represents an initial step toward more autonomous and efficient real-world IDM applications.


menu
Abstract
Full text
Outline
About this article

On Realization of Intelligent Decision Making in the Real World: A Foundation Decision Model Perspective

Show Author's information Ying Wen1Ziyu Wan1Ming Zhou1Shufang Hou2Zhe Cao1Chenyang Le1Jingxiao Chen1Zheng Tian3Weinan Zhang1( )Jun Wang2,4
SEIEE, Shanghai Jiao Tong University, Shanghai 200240, China
Digital Brain Laboratory, Shanghai 201306, China
School of Creativity and Art, ShanghaiTech University, Shanghai 201210, China
Department of Computer Science, University College London, London WC1E 6BT, UK

Abstract

The pervasive uncertainty and dynamic nature of real-world environments present significant challenges for the widespread implementation of machine-driven Intelligent Decision-Making (IDM) systems. Consequently, IDM should possess the ability to continuously acquire new skills and effectively generalize across a broad range of applications. The advancement of Artificial General Intelligence (AGI) that transcends task and application boundaries is critical for enhancing IDM. Recent studies have extensively investigated the Transformer neural architecture as a foundational model for various tasks, including computer vision, natural language processing, and reinforcement learning. We propose that a Foundation Decision Model (FDM) can be developed by formulating diverse decision-making tasks as sequence decoding tasks using the Transformer architecture, offering a promising solution for expanding IDM applications in complex real-world situations. In this paper, we discuss the efficiency and generalization improvements offered by a foundation decision model for IDM and explore its potential applications in multi-agent game AI, production scheduling, and robotics tasks. Lastly, we present a case study demonstrating our FDM implementation, DigitalBrain (DB1) with 1.3 billion parameters, achieving human-level performance in 870 tasks, such as text generation, image captioning, video game playing, robotic control, and traveling salesman problems. As a foundation decision model, DB1 represents an initial step toward more autonomous and efficient real-world IDM applications.

Keywords: artificial intelligence, Transformer, intelligent decision making, foundation decision model

References(101)

[1]

N. Khatri and H. A. Ng, The role of intuition in strategic decision making, Hum. Relat., vol. 53, no. 1, pp. 57–86, 2000.

[2]

R. S. Peres, X. Jia, J. Lee, K. Sun, A. W. Colombo, and J. Barata, Industrial artificial intelligence in industry 4.0 - systematic review, challenges and outlook, IEEE Access, vol. 8, pp. 220121–220139, 2020.

[3]
H. A. Simon, Bounded rationality, in Utility and Probability, J. Eatwell, M. Milgate, P. Newman Eds. London, UK: Palgrave Macmillan, 1990, pp. 15–18.
DOI
[4]
D. E. Kirk, Optimal control theory: An introduction, Mineola, NY, USA: Dover Publications, 2004.
[5]
W. L. Winston and J. B. Goldberg, Operations research: Applications and algorithms, Boston, MA, USA: Thomson Brooks/Cole, 1998.
[6]

S. Eom and E. Kim, A survey of decision support system applications (1995–2001), J. Oper. Res. Soc., vol. 57, no. 11, pp. 1264–1278, 2006.

[7]
J. W. Herrmann, Handbook of production scheduling, New York, NY, USA: Springer, 2006.
DOI
[8]
M. S. Nolan, Fundamentals of air traffic control, Clifton Park, NY, USA: Delmar Cengage Learning, 2010.
[9]

G. O. Barnett, J. J. Cimino, J. A. Hupp, and E. P. Hoffer, DXplain. An evolving diagnostic decision-support system, JAMA, vol. 258, no. 1, pp. 67–74, 1987.

[10]

J. Ranjan, Business intelligence: Concepts, components, techniques and benefits, J. Theor. Appl. Inf. Technol., vol. 9, no. 1, pp. 60–70, 2009.

[11]
T. Gudehus and H. Kotzab, Comprehensive logistics, Heidelberg, Germany: Springer, 2012.
DOI
[12]

M. B. Jensen, M. P. Philipsen, A. Mogelmose, T. B. Moeslund, and M. M. Trivedi, Vision for looking at traffic lights: Issues, survey, and perspectives, IEEE Trans. Intell. Transport. Syst., vol. 17, no. 7, pp. 1800–1815, 2016.

[13]
G. Phillips-Wren, N. Ichalkaranje, and L. C. Jain, Intelligent decision making: An AI-based approach, Heidelberg, Germany: Springer-Verlag, 2008.
DOI
[14]
C. Gupta and A. Farahat, Deep learning for industrial AI: Challenges, new methods and best practices, in Proc. 26th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, virtual, 2020, pp. 3571–3572.
DOI
[15]

J. Wang, W. Zhang, and S. Yuan, Display advertising with real-time bidding (RTB) and behavioural targeting, Found. Trends® Inf. Retr., vol. 11, no. 4–5, pp. 297–435, 2017.

[16]
R. Salakhutdinov, Deep learning, in Proc. 20th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, New York, NY, USA, 2014, p. 1973.
DOI
[17]
S. Russell and P. Norvig, Artificial intelligence: A modern approach, Upper Saddle River, NJ, USA: Pearson, 2010.
[18]

R. R. Murphy, Introduction to AI robotics, Ind. Robot Int. J., vol. 28, no. 3, pp. 266–267, 2001.

[19]

J. Chen, J. Sun, and G. Wang, From unmanned systems to autonomous intelligent systems, Engineering, vol. 12, pp. 16–19, 2022.

[20]
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing atari with deep reinforcement learning, arXiv preprint arXiv: 1312.5602, 2013.
[21]
R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Cambridge, MA, USA: MIT Press, 2018.
[22]

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., Mastering the game of Go without human knowledge, Nature, vol. 550, no. 7676, pp. 354–359, 2017.

[23]

O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., Grandmaster level in StarCraft II using multi-agent reinforcement learning, Nature, vol. 575, no. 7782, pp. 350–354, 2019.

[24]
R. Sutton, The bitter lesson, http://www.incompleteideas.net/IncIdeas/BitterLesson.html, 2019.
[25]
R. Qin, S. Gao, X. Zhang, Z. Xu, S. Huang, Z. Li, W. Zhang, and Y. Yu, Neorl: A near real-world benchmark for offline reinforcement learning, arXiv preprint arXiv: 2102.00714, 2021.
[26]
I. Sutskever, O. Vinyals, and Q. V. Le, Sequence to sequence learning with neural networks, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 3104–3112.
[27]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM, vol. 60, no. 6, pp. 84–90, 2017.

[28]
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 4171–4186.
[29]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, in Proc. 34th Conf. Neural Information Processing Systems (NeurIPS 2020), virtual, 2020.
[30]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin Transformer: Hierarchical Vision Transformer using shifted windows, in Proc. 2021 IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9992–10002.
DOI
[31]
L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, Decision Transformer: Reinforcement learning via sequence modeling, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 15084–15097.
[32]
M. Janner, Q. Li, and S. Levine, Offline reinforcement learning as one big sequence modeling problem, in Proc. 35th Conf. Neural Information Processing Systems (NeurIPS 2021), virtual, 2021, pp. 1273–1286.
[33]
B. Baker, I. Akkaya, P. Zhokhov, J. Huizinga, J. Tang, A. Ecoffet, B. Houghton, R. Sampedro, and J. Clune, Video PreTraining (VPT): Learning to act by watching unlabeled online videos, arXiv preprint arXiv: 2206.11795, 2022.
[34]
S. Reed, K. Zolna, E. Parisotto, S. G. Colmenarejo, A. Novikov, G. Barth-Maron, M. Gimenez, Y. Sulsky, J. Kay, J. T. Springenberg, et al., A generalist agent, arXiv preprint arXiv: 2205.06175, 2022.
[35]
K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu et al., A survey on visual transformer, arXiv preprint arXiv: 2012.12556, 2020.
[36]
P. Xu, X. Zhu, and D. A. Clifton, Multimodal learning with transformers: A survey, IEEE Trans. Pattern Anal. Mach. Intell., pp. 1–20, 2023.
DOI
[37]
Y. -H H. Tsai, S. Bai, P. Pu Liang, J. Z. Kolter, L. -P. Morency, and R. Salakhutdinov, Multimodal transformer for unaligned multimodal language sequences, Proc. Conf. Assoc. Comput. Linguist. Meet., vol. 2019, pp. 6558–6569, 2019.
DOI
[38]
Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun, Transformers in time series: A survey, arXiv preprint arXiv: 2202.07125, 2022.
DOI
[39]
R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al., On the opportunities and risks of foundation models, arXiv preprint arXiv: 2108.07258, 2021.
[40]
H. Bao, L. Dong, and F. Wei, BEiT: Bert pre-training of image Transformers, arXiv preprint arXiv: 2106.08254, 2021.
[41]
W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal, O. K. Mohammed, S. Singhal, S. Som, et al., Image as a foreign language: BEiT pretraining for all vision and vision-language tasks, arXiv preprint arXiv: 2208.10442, 2022.
DOI
[42]
Z. Lin, M. Feng, C. N. dos Santos, M. Yu, B. Xiang, B. Zhou, and Y. Bengio, A structured self-attentive sentence embedding, in Proc. 5th Int. C. Learning Representations, Toulon, France, 2017.
[43]
M. Wen, J. G. Kuba, R. Lin, W. Zhang, Y. Wen, J. Wang, and Y. Yang, Multi-agent reinforcement learning is a sequence modeling problem, in Proc. 36th Conf. Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 2022.
[44]

C. A. Seger and E. J. Peterson, Categorization = decision making + generalization, Neurosci. Biobehav. Rev., vol. 37, no. 7, pp. 1187–1200, 2013.

[45]

R. Kirk, A. Zhang, E. Grefenstette, and T. Rocktäschel, A survey of zero-shot generalisation in deep reinforcement learning, J. Artif. Intell. Res., vol. 76, pp. 201–264, 2023.

[46]

O. Poquet and M. de Laat, Developing capabilities: Lifelong learning in the age of AI, Br. J. Educ. Technol., vol. 52, no. 4, pp. 1695–1708, 2021.

[47]
L. Kaiser, A. N. Gomez, N Shazeer, A Vaswani, N Parmar, L Jones, and J Uszkoreit, One model to learn them all, arXiv preprint arXiv: 1706.05137, 2017.
[48]
J. Schmidhuber, One big net for everything, arXiv preprint arXiv: 1802.08864, 2018.
[49]
C. Gan, Y. Zhang, J. Wu, B. Gong, and J. B. Tenenbaum, Look, listen, and act: Towards audio-visual embodied navigation, in Proc. 2020 IEEE Int. Conf. Robotics and Automation (ICRA), Paris, France, 2020, pp. 9701–9707.
DOI
[50]
M. Ahn, A. Brohan, N. Brown, Y. Chebotar, O. Cortes, B. David, C. Finn, K. Gopalakrishnan, K. Hausman, A. Herzog, et al., Do as i can, not as i say: Grounding language in robotic affordances, arXiv preprint arXiv 2204.01691, 2022.
[51]
D. Perez-Liebana, K. Hofmann, S. P. Mohanty, N. Kuno, A. Kramer, S. Devlin, R. D. Gaina, and D. Ionita, The multi-agent reinforcement learning in malm\ “O (MARL\” O) competition, arXiv preprint arXiv: 1901.08129, 2019.
[52]
L. Fan, G. Wang, Y. Jiang, A. Mandlekar, Y. Yang, H. Zhu, A. Tang, D. -A. Huang, Y. Zhu, and A. Anandkumar, MineDojo: building open-ended embodied agents with internet-scale knowledge, arXiv preprint arXiv: 2206.08853, 2022.
[53]
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language supervision, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 8748–8763.
[54]
S. Vandenhende, S. Georgoulis, W. Van Gansbeke, M. Proesmans, D. Dai, and L. Van Gool, Multi-task learning for dense prediction tasks: A survey, IEEE Trans. Pattern Anal. Mach. Intell., p. 1, 2021.
DOI
[55]

Q. Fu, Z. Wang, N. Fang, B. Xing, X. Zhang, and J. Chen, MAML^2: meta reinforcement learning via meta-learning for task categories, Front. Comput. Sci., vol. 17, no. 4, pp. 1–11, 2022.

[56]

Y. Zhang and Q. Yang, A survey on multi-task learning, IEEE Trans. Knowl. Data Eng., vol. 34, no. 12, pp. 5586–5609, 2022.

[57]

B. Liu, Lifelong machine learning: a paradigm for continuous learning, Front. Comput. Sci., vol. 11, no. 3, pp. 359–361, 2017.

[58]
A. P. Badia, B. Piot, S. Kapturowski, P. Sprechmann, A. Vitvitskyi, Z. D. Guo, and C. Blundell, Agent57: Outperforming the Atari human benchmark, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 507–517.
[59]
L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, et al., IMPALA: Scalable distributed deep-RL with importance weighted actor-learner architectures, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 1406–1415.
[60]
W. Zhao, J. P. Queralta, and T. Westerlund, Sim-to-real transfer in deep reinforcement learning for robotics: a survey, in Proc. 2020 IEEE Symp. Series on Computational Intelligence (SSCI), Canberra, Australia, pp. 737–744.
DOI
[61]

E. Salvato, G. Fenu, E. Medvet, and F. A. Pellegrino, Crossing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning, IEEE Access, vol. 9, pp. 153171–153187, 2021.

[62]
B. Zheng, S. Verma, J. Zhou, I. Tsang, and F. Chen, Imitation learning: Progress, taxonomies and opportunities, arXiv preprint arXiv: 2106.12177, 2021.
[63]
J. Ho and S. Ermon, Generative adversarial imitation learning, in Proc. 30th Conf. Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 2016, pp. 4565–4573.
[64]

X. Zhang, H. Ma, X. Luo, and J. Yuan, LIDAR: learning from imperfect demonstrations with advantage rectification, Front. Comput. Sci., vol. 16, no. 1, pp. 1–10, 2021.

[65]

J. Hua, L. Zeng, G. Li, and Z. Ju, Learning for a robot: Deep reinforcement learning, imitation learning, transfer learning, Sensors, vol. 21, no. 4, p. 1278, 2021.

[66]

S. Gronauer and K. Diepold, Multi-agent deep reinforcement learning: a survey, Artif. Intell. Rev., vol. 55, no. 2, pp. 895–943, 2022.

[67]

L. Meng, M. Wen, Y. Yang, C. Le, X. Li, W. Zhang, Y. Wen, H. Zhang, J. Wang, and B. Xu, Offline pre-trained multi-agent decision Transformer, Mach. Intell. Res., vol. 20, no. 2, pp. 233–248, 2023.

[68]
J. G. Kuba, R. Chen, M. Wen, Y. Wen, F. Sun, J. Wang, and Y. Yang, Trust region policy optimisation in multi-agent reinforcement learning, in Proc. 10th Int. Conf. Learning Representations, virtual, 2022.
[69]
M. Hausknecht and P. Stone, Deep recurrent Q-learning for partially observable MDPs, in Proc. 2015 AAAI Fall Symp., Arlington, VA, USA, 2015, pp. 29–37.
[70]

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., Mastering the game of Go with deep neural networks and tree search, Nature, vol. 529, no. 7587, pp. 484–489, 2016.

[71]
N. Brown and T. Sandholm, Libratus: the superhuman AI for No-limit poker, in Proc. 26th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 5226–5228.
DOI
[72]
OpenAI, C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer et al., Dota 2 with large scale deep reinforcement learning, arXiv preprint arXiv: 1912.06680, 2019.
[73]
Q. Yin, J. Yang, W. Ni, B. Liang, and K. Huang, AI in games: Techniques, challenges and opportunities, arXiv preprint arXiv: 2111.07631, 2021.
[74]

Z. Jiang, S. Yuan, J. Ma, and Q. Wang, The evolution of production scheduling from Industry 3.0 through Industry 4.0, Int. J. Prod. Res., vol. 60, no. 11, pp. 3534–3554, 2022.

[75]

C. Waubert de Puiseau, R. Meyes, and T. Meisen, On reliability of reinforcement learning based production scheduling systems: a comparative survey, J. Intell. Manuf., vol. 33, no. 4, pp. 911–927, 2022.

[76]

F. Bonin-Font, A. Ortiz, and G. Oliver, Visual navigation for mobile robots: A survey, J. Intell. Rob. Syst., vol. 53, no. 3, pp. 263–296, 2008.

[77]

B. Singh, R. Kumar, and V. P. Singh, Reinforcement learning in robotic applications: a comprehensive survey, Artif. Intell. Rev., vol. 55, no. 2, pp. 945–990, 2022.

[78]
C. Beattie, J. Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Küttler, A. Lefrancq, S. Green, V. Valdés, A. Sadik, et al, Deepmind lab, AarXiv preprint arXiv: 1612.03801, 2016.
[79]
S. Schmitt, M. Hessel, and K. Simonyan, Off-policy actor-critic with shared experience replay, in Proc. 37th Int. Conf. Machine Learning, 2020, virtual, pp. 8545–8554.
[80]
K. Cobbe, C. Hesse, J. Hilton, and J. Schulman, Leveraging procedural generation to benchmark reinforcement learning, in Proc. 37th Int. Conf. Machine Learning, 2020, virtual, pp. 2048–2056.
[81]
S. Racanière, T. Weber, D. P. Reichert, L. Buesing, A. Guez, D. J. Rezende, A. P. Badia, O. Vinyals, N. Heess, Y. Li, et al., Imagination-augmented agents for deep reinforcement learning, in Proc. 31st Conf. Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017, pp. 5690–5701.
[82]
M. Chevalier-Boisvert, D. Bahdanau, S. Lahlou, L. Willems, C. Saharia, T. H. Nguyen, and Y. Bengio, BabyAI: A platform to study the sample efficiency of grounded language learning, in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.
[83]
W. Huang, I. Mordatch, and D. Pathak, One policy to control them all: Shared modular policies for agent-agnostic control, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 4455–4464.
[84]
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
DOI
[85]
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, OpenAI Gym, arXiv preprint arXiv: 1606.01540, 2016.
[86]
G. Barth-Maron, M. W. Hoffman, D. Budden, W. Dabney, D. Horgan, D. TB, A. Muldal, N. Heess, and T. P. Lillicrap, Distributed distributional deterministic policy gradients, presented at the 6th Int. Conf. Learning Representations, Vancouver, Canada, 2018.
[87]
T. Yu, D. Quillen, Z. He, R. Julian, K. Hausman, C. Finn, and S. Levine, Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning, in Proc. 3rd Annu. Conf. Robot Learning, Osaka, Japan, 2019, pp. 1094–1100.
[88]
A. Petrenko, Z. Huang, T. Kumar, G. Sukhatme, and V. Koltun, Sample factory: Egocentric 3D control from pixels at 100000 FPS with asynchronous reinforcement learning, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 7652–7662.
[89]

S. Tunyasuvunakool, A. Muldal, Y. Doron, S. Liu, S. Bohez, J. Merel, T. Erez, T. Lillicrap, N. Heess, and Y. Tassa, Dm_control: Software and tasks for continuous control, Softw. Impacts, vol. 6, p. 100022, 2020.

[90]
X. Chen, H. Fang, T. Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick, Microsoft COCO captions: Data collection and evaluation server, arXiv preprint arXiv: 1504.00325, 2015.
[91]
B. Gavish and S. Graves, The travelling salesman problem and related problems, http://hdl.handle.net/1721.1/5363, 1978.
[92]

S. Lin and B. W. Kernighan, An effective heuristic algorithm for the traveling-salesman problem, Oper. Res., vol. 21, no. 2, pp. 498–516, 1973.

[93]
T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, presented at the 5th Int. Conf. Learning Representations, Toulon, France, 2017.
[94]
Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. Le, and R. Salakhutdinov, Transformer-XL: Attentive language models beyond a fixed-length context, in Proc. 57th Annu. Meeting Association for Computational Linguistics, Florence, Italy, 2019, pp. 2978–2988.
DOI
[95]
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, https://paperswithcode.com/paper/language-models-are-unsupervised-multitask, 2019.
[96]
R. Xiong, Y. Yang, D. He, K. Zheng, S. Zheng, C. Xing, H. Zhang, Y. Lan, L. Wang, et al., On layer normalization in the Transformer architecture, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 10524–10533.
[97]
N. Shazeer, GLU variants improve Transformer, arXiv preprint arXiv: 2002.05202, 2020.
[98]

R. J. Williams and D. Zipser, A learning algorithm for continually running fully recurrent neural networks, Neural Comput., vol. 1, no. 2, pp. 270–280, 1989.

[99]
I. Loshchilov and F. Hutter. Decoupled weight decay regularization, presented at the 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.
[100]
P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing, ACM Comput. Surv., vol. 55, no. 9, p. 195,
DOI
[101]
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, The arcade learning environment: An evaluation platform for general agents, in Proc. 24th Int. Conf. Artificial Intelligence, Buenos Aires, Argentina, 2015, pp. 4148–4152.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 02 July 2023
Revised: 21 September 2023
Accepted: 03 November 2023
Published: 10 January 2024
Issue date: December 2023

Copyright

© The author(s) 2023.

Acknowledgements

Acknowledgment

This work was completed when Z. Wan, M. Zhou, Z. Cao, C. Le, and J. Chen were interns at Digital Brain Laboratory.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return