[1]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6000–6060.
[2]
R. Al-Rfou, D. Choe, N. Constant, M. Guo, and L. Jones, Character-level language modeling with deeper self-attention, arXiv preprint arXiv: 1808.04444, 2018.
[3]
N. Elhage, N. Nanda, C. Olsson, T. Henighan, N. Joseph, B. Mann, A. Askell, Y. Bai, A. Chen, T. Conerly, et al., A mathematical framework for transformer circuits, https://transformer-circuits.pub/2021/framework/index.html, 2021.
[4]
C. Olsson, N. Elhage, N. Nanda, N. Joseph, N. DasSarma, T. Henighan, B. Mann, A. Askell, Y. Bai, A. Chen, et al., In-context learning and induction heads, https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html#argument-architectural-requirements, 2022.
[5]
L. Wang, J. Huang, K. Huang, Z. Hu, G. Wang, and Q. Gu, Improving neural language generation with spectrum control, presented at the International Conference on Learning Representations, Addis Ababa, Ethiopia, 2020.
[6]
K. Ethayarajh, How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 55–65.
[7]
J. Gao, D. He, X. Tan, T. Qin, L. Wang, and T. Liu, Representation degeneration problem in training natural language generation models, presented at the International Conference on Learning Representations 2019, New Orleans, LA, USA, 2019.
[8]
D. Biś, M. Podkorytov, and X. Liu, Too much in common: Shifting of embeddings in transformer language models and its implications, in Proc. 2021 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Online, 2021, pp. 5117–5130.
[9]
N. Godey, E. de la Clergerie, and B. Sagot, Is anisotropy inherent to transformers? arXiv preprint arXiv: 2306.07656, 2023.
[10]
W. Rudman and C. Eickhoff, Stable anisotropic regularization, arXiv preprint arXiv: 2305.19358, 2023.
[11]
Y. Lakretz, G. Kruszewski, T. Desbordes, D. Hupkes, S. Dehaene, and M. Baroni, The emergence of number and syntax units in LSTM language models, arXiv preprint arXiv: 1903.07435, 2019.
[12]
C. Olah, Understanding LSTM networks, https://colah.github.io/posts/2015-08-Understanding-LSTMs/, 2015.
[14]
M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, Deep contextualized word representations, in Proc. 2018 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, US, 2018, pp. 2227–2237.
[15]
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, 2019, pp. 4171–4186.
[16]
A. Karpathy, J. Johnson, and L. Fei-Fei, Visualizing and understanding recurrent networks, arXiv preprint arXiv: 1506.02078, 2015.
[17]
G. Weiss, Y. Goldberg, and E. Yahav, Thinking like transformers, in Proc. 38th International Conference on Machine Learning, Virtual Event, 2021, pp. 11080–11090.
[18]
A. Rogers, O. Kovaleva, and A. Rumshisky, A primer in BERTology: What we know about how BERT works, Trans. Assoc. Comput. Linguist., vol. 8, pp. 842–866, 2020.
[19]
J. Bastings, Y. Belinkov, Y. Elazar, D. Hupkes, N. Saphra, and S. Wiegreffe, BlackboxNLP analyzing and interpreting neural networks for NLP, presented at the Microsoft at EMNLP 2022, Hybrid, United Arab Emirates, 2022.
[20]
M. T. Ribeiro, T. Wu, C. Guestrin, and S. Singh, Beyond accuracy: Behavioral testing of NLP models with CheckList, in Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020, pp. 4902–4912.
[21]
T. Linzen, G. Chrupała, Y. Belinkov, and D. Hupkes, Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, https://aclanthology.org/W19-4800/, 2019.
[22]
Y. Belinkov and Y. Bisk, Synthetic and natural noise both break neural machine translation, arXiv preprint arXiv: 1711.02173, 2017.
[23]
I. Provilkov, D. Emelianenko, and E. Voita, BPE-dropout: Simple and effective subword regularization, in Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020, pp. 1882–1892.
[25]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., arXiv preprint arXiv: 2005.14165v1, 2020.
[26]
A. Venigalla and L. Li Mosaic LLMs: GPT-3 quality for <$500k, https://www.mosaicml.com/blog/gpt-3-quality-for-500k, 2023.
[27]
J. Schulman, B. Zoph, C. Kim, J. Hilton, J. Menick, J. Weng, J. F. C. Uribe, L. Fedus, L. Metz, M. Pokorny, et al., Introducing ChatGPT, https://openai.com/index/chatgpt/, 2022.
[28]
OpenAI, GPT-4 API general availability and deprecation of older models in the completions API, https://openai.com/blog/gpt-4-api-general-availability, 2023.
[29]
J. Yang, H. Jin, R. Tang, X. Han, Q. Feng, H. Jiang, B. Yin, and X. Hu, Harnessing the power of LLMs in practice: A survey on ChatGPT and beyond, arXiv preprint arXiv: 2304.13712, 2023.
[30]
C. Zhang, C. Zhang, C. Li, Y. Qiao, S. Zheng, S. K. Dam, M. Zhang, J. U. Kim, S. T. Kim, J. Choi, et al., One small step for generative AI, one giant leap for AGI: A complete survey on ChatGPT in AIGC era, arXiv preprint arXiv: 2304.06488, 2023.
[31]
T. Eloundou, S. Manning, P. Mishkin, and D. Rock, GPTs are GPTs: An early look at the labor market impact potential of large language models, arXiv preprint arXiv: 2303.10130, 2023.
[32]
A. S. George and A. S. H. George, A review of ChatGPT AI’s impact on several business sectors, Partners Universal International Innovation Journal, vol. 1, no. 1, pp. 9–23, 2023.
[34]
Y. Liu, D. Iter, Y. Xu, S. Wang, R. Xu, and C. Zhu, G-eval: NLG evaluation using GPT-4 with better human alignment, arXiv preprint arXiv: 2303.16634, 2023.
[35]
L. Zheng, W. L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing, et al., Judging LLM-as-a-judge with MT-Bench and chatbot arena, arXiv preprint arXiv: 2306.05685v4, 2023.
[36]
A. Perry, OpenAI updates GPT-4 with new features, https://mashable.com/article/openai-chatgpt-gpt-4-function-calling-update, 2023.
[37]
M. G. Southern, OpenAI’s ChatGPT update brings improved accuracy, https://www.searchenginejournal.com/openai-chatgpt-update/476116/, 2023.
[38]
Y. Deng, OpenAI watch, https://openaiwatch.com/, 2023.
[39]
M. Alizadeh, M. Kubli, Z. Samei, S. Dehghani, J. D. Bermeo, M. Korobeynikova, and F. Gilardi, Open-source large language models outperform crowd workers and approach ChatGPT in text-annotation tasks, arXiv preprint arXiv: 2307.02179, 2023.
[40]
S. Mukherjee, A. Mitra, G. Jawahar, S. Agarwal, H. Palangi, and A. Awadallah, Orca: Progressive learning from complex explanation traces of GPT-4, https://www.microsoft.com/en-us/research/publication/orca-progressive-learning-from-complex-explanation-traces-of-gpt-4/?locale=zh-cn, 2023.
[41]
S. Gunasekar, Y. Zhang, J. Aneja, C. C. T. Mendes, A. Del Giorno, S. Gopi, M. Javaheripi, P. Kauffmann, G. de Rosa, O. Saarikivi, et al., Textbooks are all you need, arXiv preprint arXiv: 2306.11644, 2023.
[42]
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. A. Lachaux, T. Lacroix, B. Roziere, N. Goyal, E. Hambro, F. Azhar, et al., LLaMA: Open and efficient foundation language models, arXiv preprint arXiv: 2302.13971v1, 2023.
[43]
E. Almazrouei, H. Alobeidli, A. Alshamsi, A. Cappelli, R. Cojocaru, M. Debbah, E. Goffinet, D. Heslow, J. Launay, Q. Malartic, et al., Falcon-40B: An open large language model with state-of-the-art performance, 2023.
[44]
Stability AI, StableLM: StableLM: Stability AI language models, 2023.
[45]
R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto, Stanford alpaca: An instruction-following LLaMA model, https://github.com/tatsu-lab/stanford_alpaca, 2023.
[46]
The Vicuna Team, Vicuna: An open-source chatbot impressing GPT-4 with 90%* ChatGPT quality, https://lmsys.org/blog/2023-03-30-vicuna/, 2023.
[47]
T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, QLoRA: Efficient finetuning of quantized LLMs, arXiv preprint arXiv: 2305.14314, 2023.
[48]
R. C. Fong and A. Vedaldi, Interpretable explanations of black boxes by meaningful perturbation, in Proc. IEEE Int. Conf. Computer Vision (ICCV), Venice, Italy, 2017, pp. 3449–3457.
[49]
A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi, The curious case of neural text degeneration, arXiv preprint arXiv: 1904.09751, 2019.
[50]
C. Olah, Mechanistic interpretability, variables, and the importance of interpretable bases, https://transformer-circuits.pub/2022/mech-interp-essay/index.html, 2022.
[52]
A. Jacovi and Y. Goldberg, Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? in Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020, pp. 4198–4205.
[53]
C. Chen, S. Feng, A. Sharma, and C. Tan, Machine explanations and human understanding, in Proc. 2023 ACM Conf. Fairness, Accountability, and Transparency, Chicago, IL, USA, 2023, p. 1.
[54]
T. M. Cover, Elements of Information Theory. New York, NY, USA: John Wiley & Sons, 1999.
[55]
P. D. Grünwald, The Minimum Description Length Principle. Cambridge, MA, USA: The MIT Press, 2007.
[56]
C. M. Barry, Who sharpened occam’s razor? https://www.irishphilosophy.com/2014/05/27/who-sharpened-occams-razor/, 2014.
[59]
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, Language models are unsupervised multitask learners, https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf, 2019.
[60]
P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig, Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing, arXiv preprint arXiv: 2107.13586, 2021.
[61]
A. Radford, J. Wu, D. Amodei, D. Amodei, J. Clark, M. Brundage, and I. Sutskever, Better language models and their implications, https://openai.com/index/better-language-models/, 2019.
[62]
B. Z. Li, J. Yu, M. Khabsa, L. Zettlemoyer, A. Halevy, and J. Andreas, Quantifying adaptability in pre-trained language models with 500 tasks, arXiv preprint arXiv: 2112.03204, 2021.
[63]
A. Rohrbach, L. A. Hendricks, K. Burns, T. Darrell, and K. Saenko, Object hallucination in image captioning, in Proc. 2018 Conf. Empirical Methods in Natural Language Processing, Brussels, Belgiu, 2018, pp. 4035–4045.
[64]
T. Liao, R. Taori, I. D. Raji, and L. Schmidt, Are we learning yet? A meta review of evaluation failures across machine learning, in 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
[65]
National Association for College Admission Counseling, Report of the commission on the use of standardized tests in undergraduate admission, https://files.eric.ed.gov/fulltext/ED502721.pdf, 2008.
[66]
Baidu Baike, Nationwide Unified Examination for Admissions to General Universities and Colleges, (in Chinese), https://baike.baidu.com/item/普通高等学校招生全国统一考试/2567351, 2022.
[67]
C. Wang and R. Sennrich, On exposure bias, hallucination and domain shift in neural machine translation, in Proc. 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020, pp. 3544–3552.
[68]
M. Müller, A. Rios, and R. Sennrich, Domain robustness in neural machine translation, arXiv preprint arXiv: 1911.03109, 2020.
[69]
A. See, P. J. Liu, and C. D. Manning, Get to the point: Summarization with pointer-generator networks, arXiv preprint arXiv: 1704.04368, 2017.
[70]
A. Poliak, J. Naradowsky, A. Haldar, R. Rudinger, and B. Van Durme, Hypothesis only baselines in natural language inference, in Proc. Seventh Joint Conf. Lexical and Computational Semantics, New Orleans, LA, USA, 2018, pp. 180–191.
[71]
S. Gururangan, S. Swayamdipta, O. Levy, R. Schwartz, S. Bowman, and N. A. Smith, Annotation artifacts in natural language inference data, in Proc. 2018 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA, 2018, pp. 107–112.
[72]
T. A. Chang and B. K. Bergen, Language model behavior: A comprehensive survey, arXiv preprint arXiv: 2303.11504, 2023.
[73]
N. Jain, K. Saifullah, Y. Wen, J. Kirchenbauer, M. Shu, A. Saha, M. Goldblum, J. Geiping, and T. Goldstein, Bring your own data! Self-supervised evaluation for large language models, arXiv preprint arXiv: 2306.13651, 2023.
[74]
S. Pichai, An important next step on our AI journey, https://blog.google/technology/ai/bard-google-ai-search-updates/, 2023.
[75]
BigScience, Bigscience model training launched. BigScience Blog, 2022.
[77]
S. Bubeck, V. Chandrasekaran, R. Eldan, J. Gehrke, E. Horvitz, E. Kamar, P. Lee, Y. T. Lee, Y. Li, S. Lundberg, et al., Sparks of artificial general intelligence: Early experiments with GPT-4, arXiv preprint arXiv: 2303.12712, 2023.
[78]
J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, et al., Emergent abilities of large language models, arXiv preprint arXiv: 2206.07682, 2022.
[79]
R. Teehan, M. Clinciu, O. Serikov, E. Szczechla, N. Seelam, S. Mirkin, and A. Gokaslan, Emergent structures and training dynamics in large language models, in Proc. BigScience Episode #5—Workshop on Challenges & Perspectives in Creating Large Language Models, Virtual Event, 2022, pp. 146–159.
[81]
J. H. Holland, Complexity: A Very Short Introduction. Oxford, UK: Oxford University Press, 2014.
[82]
U. Khandelwal, K. Clark, D. Jurafsky, and L. Kaiser, Sample efficient text summarization using a single pre-trained transformer, arXiv preprint arXiv: 1905.08836, 2019.
[83]
U. Khandelwal, H. He, P. Qi, and D. Jurafsky, Sharp nearby, fuzzy far away: How neural language models use context, in Proc. 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 2018, pp. 284–294.
[84]
L. Yu, D. Simig, C. Flaherty, A. Aghajanyan, L. Zettlemoyer, and M. Lewis, MEGABYTE: Predicting million-byte sequences with multiscale transformers, arXiv preprint arXiv: 2305.07185, 2023.
[85]
I. Beltagy, M. E. Peters, and A. Cohan, Longformer: The long-document transformer, arXiv preprint arXiv: 2004.05150, 2020.
[86]
R. Child, S. Gray, A. Radford, and I. Sutskever, Generating long sequences with sparse transformers, arXiv preprint arXiv: 1904.10509, 2019.
[88]
S. Sun, K. Krishna, A. Mattarella-Micke, and M. Iyyer, Do long-range language models actually use long-range context? in Proc. 2021 Conf. Empirical Methods in Natural Language Processing, Online, 2021, pp. 807–822.
[89]
O. Press, N. A. Smith, and M. Lewis, Shortformer: Better language modeling using shorter inputs, arXiv preprint arXiv: 2012.15832v2, 2021.
[90]
F. Petroni, T. Rocktäschel, S. Riedel, P. Lewis, A. Bakhtin, Y. Wu, and A. Miller, Language models as knowledge bases? in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 2463–2473.
[91]
K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio, On the properties of neural machine translation: Encoder–decoder approaches, in Proc. SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 2014, pp. 103–111.
[92]
N. S. Keskar, B. McCann, L. R. Varshney, C. Xiong, and R. Socher, CTRL: A conditional transformer language model for controllable generation, arXiv preprint arXiv: 1909.05858, 2019.
[93]
R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, and Y. Choi, Defending against neural fake news, arXiv preprint arXiv: 1905.12616, 2019.
[94]
A. Aghajanyan, D. Okhonko, M. Lewis, M. Joshi, H. Xu, G. Ghosh, and L. Zettlemoyer, HTLM: Hyper-text pre-training and prompting of language models, arXiv preprint arXiv: 2107.06955, 2021.
[95]
S. Mishra, D. Khashabi, C. Baral, and H. Hajishirzi, Cross-task generalization via natural language crowdsourcing instructions, in Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 2022, pp. 3470–3487.
[96]
H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani, S. Brahma, et al., Scaling instruction-finetuned language models, arXiv preprint arXiv: 2210.11416, 2022.
[97]
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., Training language models to follow instructions with human feedback, arXiv preprint arXiv: 2203.02155, 2022.
[98]
C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y. Mao, X. Ma, A. Efrat, P. Yu, L. Yu, et al., LIMA: Less is more for alignment, arXiv preprint arXiv: 2305.11206, 2023.
[99]
S. Mittal, S. Diallo, and A. Tolk, Emergent Behavior in Complex Systems Engineering: A Modeling and Simulation Approach. New York, NY, USA: John Wiley & Sons, 2018.
[100]
G. Ilharco, R. Zellers, A. Farhadi, and H. Hajishirzi, Probing contextual language models for common ground with visual representations, in Proc. 2021 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 2021, pp. 5367–5377.
[101]
L. Parcalabescu, A. Gatt, A. Frank, and I. Calixto, Seeing past words: Testing the cross-modal capabilities of pretrained V&L models on counting tasks, arXiv preprint arXiv: 2012.12352, 2020.
[102]
I. Tenney, D. Das, and E. Pavlick, BERT rediscovers the classical NLP pipeline, in Proc. 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 4593–4601.
[103]
G. Daras and A. G. Dimakis, Discovering the hidden vocabulary of DALLE-2, arXiv preprint arXiv: 2206.00169, 2022.
[104]
B. Hilton, No, DALL-E doesn’t have a secret language. (or at least, we haven’t found one yet) this viral DALL-E thread has some pretty astounding claims, but maybe the reason they’re so astounding is that, for the most part, they’re not true. thread (1/15), https://t.co/8F2WDp7lTK. https://twitter.com/benjamin_hilton/status/1531780892972175361?lang=en, 2022.
[106]
N. Carlini, D. Ippolito, M. Jagielski, K. Lee, F. Tramer, and C. Zhang, Quantifying memorization across neural language models, arXiv preprint arXiv: 2202.07646, 2022.
[107]
K. Lee, D. Ippolito, A. Nystrom, C. Zhang, D. Eck, C. Callison-Burch, and N. Carlini, Deduplicating training data makes language models better, in Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 2022, pp. 8424–8445.
[108]
K. Tirumala, A. H. Markosyan, L. Zettlemoyer, and A. Aghajanyan, Memorization without overfitting: Analyzing the training dynamics of large language models, arXiv preprint arXiv: 2205.10770v2, 2022.
[109]
T. Blevins and L. Zettlemoyer, Language contamination helps explain the cross-lingual capabilities of English pretrained models, arXiv preprint arXiv: 2204.08110, 2022.
[110]
X. V. Lin, T. Mihaylov, M. Artetxe, T. Wang, S. Chen, D. Simig, M. Ott, N. Goyal, S. Bhosale, J. Du, et al., Few-shot learning with multilingual language models, arXiv preprint arXiv: 2112.10668, 2021.
[111]
N. Kandpal, E. Wallace, and C. Raffel, Deduplicating training data mitigates privacy risks in language models, arXiv preprint arXiv: 2202.06539, 2022.
[112]
L. Kiho, ChatGPT_DAN: ChatGPT DAN, jailbreaks prompt.
[113]
F. Stahlberg, I. Kulikov, and S. Kumar, Uncertainty determines the adequacy of the mode and the tractability of decoding in sequence-to-sequence models, in Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 2022, pp. 8634–8645.
[114]
X. Shi, Y. Xiao, and K. Knight, Why neural machine translation prefers empty outputs, arXiv preprint arXiv: 2012.13454, 2020.
[115]
F. Stahlberg and B. Byrne, On NMT search errors and model errors: Cat got your tongue? in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 3356–3362.
[116]
Z. Xie, T. Cohn, and J. H. Lau, Can very large pretrained language models learn storytelling with a few examples? 2023.
[117]
A. See, A. Pappu, R. Saxena, A. Yerukola, and C. D. Manning, Do massively pretrained language models make better storytellers? in Proc. 23rd Conf. Computational Natural Language Learning (CoNLL), Hong Kong, China, 2019, pp. 843–861.
[118]
A. Lazaridou and M. Baroni, Emergent multi-agent communication in the deep learning era, arXiv preprint arXiv: 2006.02419, 2020.
[119]
S. Steinert-Threlkeld, X. Zhou, Z. Liu, and C. M. Downey, Emergent communication fine-tuning (EC-FT) for pretrained language models, presented at the ICLR 2022 EmeCom Workshop, 2022.
[120]
A. Warstadt, L. Choshen, A. Mueller, A. Williams, E. Wilcox, and C. Zhuang, Call for papers: The BabyLM challenge: Sample-efficient pretraining on a developmentally plausible corpus, arXiv preprint arXiv: 2301.11796, 2023.
[121]
D. Hafner, T. Lillicrap, I. S. Fischer, R. Villegas, D. R. Ha, H. Lee, and J. Davidson, Learning latent dynamics for planning from pixels, in Proc. 36th International Conference on Machine Learning: ICML 2019, Long Beach, CA, USA, 2019, pp. 2555–2565.
[122]
M. Morin and M. Willetts, Non-determinism in TensorFlow ResNets, arXiv preprint arXiv: 2001.11396, 2020.
[124]
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, High-resolution image synthesis with latent diffusion models, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 10674–10685.
[125]
T. McCoy, E. Pavlick, and T. Linzen, Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference, in Proc. 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 3428–3448.
[126]
M. Caro, Caro’s Book of Poker Tells. Washington, DC, USA: Cardoza Publishing, 2003.
[127]
M. Mori, The uncanny valley: The original essay by Masahiro Mori, https://spectrum.ieee.org/the-uncanny-valley, 2012.
[128]
A. Wilson, How to jailbreak ChatGPT to unlock its full potential, https://approachableai.com/how-to-jailbreak-chatgpt/, 2023.
[129]
E. Eliaçık, Playing with fire: The leaked plugin DAN unchains ChatGPT from its moral and ethical restrictions, https://dataconomy.com/2023/03/31/chatgpt-dan-prompt-how-to-jailbreak-chatgpt/, 2023.
[130]
M. Le, A. Vyas, B. Shi, B. Karrer, L. Sari, R. Moritz, M. Williamson, V. Manohar, Y. Adi, J. Mahadeokar, et al., Voicebox: Text-guided multilingual universal speech generation at scale, arXiv preprint arXiv: 2306.15687, 2023.
[131]
Z. Luo, D. Chen, Y. Zhang, Y. Huang, L. Wang, Y. Shen, D. Zhao, J. Zhou, and T. Tan, VideoFusion: Decomposed diffusion models for high-quality video generation, arXiv preprint arXiv: 2303.08320, 2023.