Center for Linguistics and Applied Linguistics, and Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510006, China
Guangdong Provincial Key Laboratory of Nanophotonic Functional Materials and Devices, South China Normal University, Guangzhou 510006, China
School of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou 510006, China
Modern Education Technology Center, Guangdong University of Foreign Studies, Guangzhou 510006, China
Show Author Information
Hide Author Information
Abstract
Emotion classification in textual conversations focuses on classifying the emotion of each utterance from textual conversations. It is becoming one of the most important tasks for natural language processing in recent years. However, it is a challenging task for machines to conduct emotion classification in textual conversations because emotions rely heavily on textual context. To address the challenge, we propose a method to classify emotion in textual conversations, by integrating the advantages of deep learning and broad learning, namely DBL. It aims to provide a more effective solution to capture local contextual information (i.e., utterance-level) in an utterance, as well as global contextual information (i.e., speaker-level) in a conversation, based on Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (Bi-LSTM), and broad learning. Extensive experiments have been conducted on three public textual conversation datasets, which show that the context in both utterance-level and speaker-level is consistently beneficial to the performance of emotion classification. In addition, the results show that our proposed method outperforms the baseline methods on most of the testing datasets in weighted-average F1.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
L.Zhou, J.Gao, D.Li, and H.Shum, The design and implementation of XiaoIce, an empathetic social chatbot, Comput. Linguist., vol. 46, no. 1, pp. 53–93, 2020.
S.Peng, L.Cao, Y.Zhou, Z.Ouyang, A.Yang, X.Li, W.Jia, and S.Yu, A survey on deep learning for textual emotion analysis in social networks, Dig. Commun. Netw., vol. 8, no. 5, pp. 745–762, 2022.
H.Ma, J.Wang, H.Lin, X.Pan, Y.Zhang, and Z.Yang, A multi-view network for real-time emotion recognition in conversations, Knowl.-Based Syst., vol. 236, p. 107751, 2022.
W.Jiao, H.Yang, I.King, and M. R.Lyu, HiGRU: Hierarchical gated recurrent units for utterance-level emotion recognition, in Proc. 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2019, pp. 397–406.
N.Majumder, S.Poria, D.Hazarika, R.Mihalcea, A.Gelbukh, and E.Cambria, DialogueRNN: An attentive RNN for emotion detection in conversations, in Proc. Thirty-Third AAAI Conf. on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conf. and Ninth AAAI Symp. on Educational Advances in Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 6818–6825.
D.Ghosal, N.Majumder, A.Gelbukh, R.Mihalcea, and S.Poria, COSMIC: Commonsense knowledge for emotion identification in conversations, in Proc. Findings of the Association for Computational Linguistics: EMNLP 2020, Virtual Event, 2020, pp. 2470–2481.
D.Hu, L.Wei, and X.Huai, DialogueCRN: Contextual reasoning networks for emotion recognition in conversations, in Proc. 59th Annu. Meeting of the Association for Computational Linguistics and the 11th Int. Joint Conf. on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 2021, pp. 7042–7052.
D.Li, Y.Li, and S.Wang, Interactive double states emotion cell model for textual dialogue emotion prediction, Knowl.-Based Syst., vol. 189, p. 105084, 2020.
J.Li, D.Ji, F.Li, M.Zhang, and Y.Liu, HiTrans: A transformer-based context- and speaker-sensitive model for emotion detection in conversations, in Proc. 28th Int. Conf. on Computational Linguistics, Barcelona, Spain, 2020, pp. 4190–4200.
X.Lu, Y.Zhao, Y.Wu, Y.Tian, H.Chen, and B.Qin, An iterative emotion interaction network for emotion recognition in conversations, in Proc. 28th Int. Conf. on Computational Linguistics, Barcelona, Spain, 2020, pp. 4078–4088.
L.Zhu, G.Pergola, L.Gui, D.Zhou, and Y.He, Topic-driven and knowledge-aware transformer for dialogue emotion detection, in Proc. 59th Annu. Meeting of the Association for Computational Linguistics and the 11th Int. Joint Conf. on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 2021, pp. 1571–1582.
L.Liu, Z.Zhang, H.Zhao, X.Zhou, and X.Zhou, Filling the gap of utterance-aware and speaker-aware representation for multi-turn dialogue, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 15, pp. 13406–13414, 2021.
D.Ghosal, N.Majumder, S.Poria, N.Chhaya, and A.Gelbukh, DialogueGCN: A graph convolutional neural network for emotion recognition in conversation, in Proc. Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing, Hong Kong, China, 2019, pp. 154–164.
T.Ishiwatari, Y.Yasuda, T.Miyazaki, and J.Goto, Relation-aware graph attention networks with relational position encodings for emotion recognition in conversations, in Proc. Conf. on Empirical Methods in Natural Language Processing, Virtual Event, 2020, pp. 7360–7370.
D.Zhang, L.Wu, C.Sun, S.Li, Q.Zhu, and G.Zhou, Modeling both context- and speaker-sensitive dependence for emotion detection in multi-speaker conversations, in Proc. Twenty-Eighth Int. Joint Conf. on Artificial Intelligence, Macao, China, 2019, pp. 5415–5421.
P.Zhong, D.Wang, and C.Miao, Knowledge-enriched transformer for emotion detection in textual conversations, in Proc. Conf. on Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. on Natural Language Processing, Hong Kong, China, 2019, pp. 165–176.
W.Shen, S.Wu, Y.Yang, and X.Quan, Directed acyclic graph network for conversational emotion recognition, in Proc. 59th Annu. Meeting of the Association for Computational Linguistics and the 11th Int. Joint Conf. on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 2021, pp. 1551–1560.
J.Hu, Y.Liu, J.Zhao, and Q.Jin, MMGCN: Multimodal fusion via deep graph convolution network for emotion recognition in conversation, in Proc. 59th Annu. Meeting of the Association for Computational Linguistics and the 11th Int. Joint Conf. on Natural Language Processing (Volume 1: Long Papers), Virtual Event, 2021, pp. 5666–5675.
C. L.Giles, G. M.Kuhn, and R. J.Williams, Dynamic recurrent neural networks: Theory and applications, IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 153–156, 1994.
K.Cho, B.van Merriënboer, C.Gulcehre, D.Bahdanau, F.Bougares, H.Schwenk, and Y.Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proc. Conf. on Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 1724–1734.
S.Peng, R.Zeng, H.Liu, G.Chen, R.Wu, A.Yang, and S.Yu, Emotion classification of text based on BERT and broad learning system, in Proc. Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Int. Conf. on Web and Big Data, Guangzhou, China, 2021, pp. 382–396.
Z.Ahmad, R.Jindal, A.Ekbal, and P.Bhattachharyya, Borrow from rich cousin: Transfer learning for emotion detection using cross lingual embedding, Expert Syst. Appl., vol. 139, p. 112851, 2020.
F.Barbieri, J.Camacho-Collados, F.Ronzano, L.Espinosa-Anke, M.Ballesteros, V.Basile, V.Patti, and H.Saggion, SemEval 2018 task 2: Multilingual emoji prediction, in Proc. 12th Int. Workshop on Semantic Evaluation, New Orleans, LA, USA, 2018, pp. 24–33.
C. L. P.Chen and Z.Liu, Broad learning system: An effective and efficient incremental learning system without the need for deep architecture, IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 1, pp. 10–24, 2018.
C. L. P.Chen, Z.Liu, and S.Feng, Universal approximation capability of broad learning system and its structural variations, IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 4, pp. 1191–1204, 2019.
S.Peng, R.Zeng, L.Cao, A.Yang, J.Niu, C.Zong, and G.Zhou, Multi-source domain adaptation method for textual emotion classification using deep and broad learning, Knowl.-Based Syst., vol. 260, p. 110173, 2023.
G.Chen, S.Peng, R.Zeng, Z.Hu, L.Cao, Y.Zhou, Z.Ouyang, and X.Nie, -norm broad learning for negative emotion classification in social networks, Big Data Mining and Analytics, vol. 5, no. 3, pp. 245–256, 2022.
Y.Cui, W.Che, T.Liu, B.Qin, and Z.Yang, Pre-training with whole word masking for Chinese BERT, IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 3504–3514, 2021.
S.Poria, D.Hazarika, N.Majumder, G.Naik, E.Cambria, and R.Mihalcea, MELD: A multimodal multi-party dataset for emotion recognition in conversations, in Proc. 57th Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 527–536.
S. M.Zahiri and J. D.Choi, Emotion detection on TV show transcripts with sequence-based convolutional neural networks, in Proc. Workshops of the Thirty-Second AAAI Conf. on Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 44–52.
I.Loshchilov and F.Hutter, Fixing weight decay regularization in Adam, in Proc. of Int. Conf. on Learning Representations, https://openreview.net/forum?id=rk6qdGgCZ, 2018.
Y.Kim, Convolutional neural networks for sentence classification, in Proc. Conf. on Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 1746–1751.
Q.Li, D.Gkoumas, A.Sordoni, J. Y.Nie, and M.Melucci, Quantum-inspired neural network for conversational emotion recognition, in Proc. AAAI Conf. Artif. Intell., vol. 35, no. 15, pp. 13270–13278, 2021.
D.Zhang, X.Chen, S.Xu, and B.Xu, Knowledge aware emotion recognition in textual conversations via multi-task incremental transformer, in Proc. 28th Int. Conf. on Computational Linguistics, Barcelona, Spain, 2020, pp. 4429–4440.
Peng S, Zeng R, Liu H, et al. Deep Broad Learning for Emotion Classification in Textual Conversations. Tsinghua Science and Technology, 2024, 29(2): 481-491. https://doi.org/10.26599/TST.2023.9010021
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
10.26599/TST.2023.9010021.F001
Framework of the proposed method DBL.
Emotion classifier
After the utterance-level and speaker-level encoding are obtained in Sections 3.3 and 3.4, respectively, BL is adopted to calculate the weight of each emotion label and to obtain the emotion label prediction in each utterance.
Since BL is composed of feature node, enhancement node, and an output layer, the feature embeddings are first linearly mapped into groups of feature nodes, and then feature nodes are nonlinearly mapped into groups of enhancement nodes. Finally, the feature nodes and enhancement nodes are input into the output layer to obtain the probability distribution of emotions. During the training process of BL, the weights of feature nodes and enhancement nodes are generated randomly and fixed, and the weights of the output layer are optimized by the ridge regression method.
However, in DBL, the deep features of utterances are extracted through the CNN and Bi-LSTM, which need not be linearly transformed into features of BL. Thus, in BL, they are directly treated as the feature nodes, which are nonlinearly transformed into the enhancement nodes. Finally, the feature nodes and enhancement nodes are concatenated to input into the output layer for calculating the weight of each label.
Thus, the utterance-level features and speaker-level features in a conversation are concatenated and nonlinearly mapped into groups of enhancement nodes. The -th group of enhancement nodes is represented as follows:
where denotes the number of enhancement nodes of each group, and are randomly generated, which denote the weight matrix and bias matrix, respectively, and denotes a nonlinear activate function.
We assume that as groups of enhancement nods. Thus, the output can be represented as follows:
where denotes the output weight of BL, and denotes the actual input of BL.
To shorten the calculation time and to prevent over-fitting, the ridge regression is adopted as an objective function in the general BL, which is represented as follows:
where denotes the regularization parameters.
Finally, according to the regularized least square method, can be represented as follows:
where I denotes an identity matrix and denotes the ground truth label of each utterance.