N. Xu, W. J. Mao, and G. D. Chen, Multi-interactive memory network for aspect based multimodal sentiment analysis, Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 371-378, 2019.
Z. Yu, J. Yu, J. P. Fan, and D. C. Tao, Multi-modal factorized bilinear pooling with co-attention learning for visual question answering, in Proc. 2017 IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 1839-1848.
Z. Yu, J. Yu, C. C. Xiang, J. P. Fan, and D. C. Tao, Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering, IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 12, pp. 5947-5959, 2018.
D. Hazarika, S. Poria, R. Mihalcea, E. Cambria, and R. Zimmermann, ICON: Interactive conversational memory network for multimodal emotion detection, in Proc. 2018 Conf. Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 2594-2604.
A. Hu and S. Flaxman, Multimodal sentiment analysis to explore the structure of emotions, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 350-358.
P. Anderson, X. D. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang, Bottom-up and top-down attention for image captioning and visual question answering, in Proc. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 6077-6086.
L. Zhang, S. Wang, and B. Liu, Deep learning for sentiment analysis: A survey, WIREs Data Min. Knowl. Discov., vol. 8, no. 4, p. e1253, 2018.
S. C. Zhao, S. F. Wang, M. Soleymani, D. Joshi, and Q. Ji, Affective computing for large-scale heterogeneous multimedia data: A survey, ACM Trans. Multimed. Comput. Commun. Appl., vol. 15, no. 3s, p. 93, 2020.
T. Niu, S. A. Zhu, L. Pang, and A. El Saddik, Sentiment analysis on multi-view social data, in Proc. 22nd Int. Conf. MultiMedia Modeling, Miami, FL, USA, 2016, pp. 15-27.
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, in Proc. 3rd Int. Conf. Learning Representations, arXiv preprint arXiv:1409.1556v6.
G. R. Wang, K. Z. Wang, and L. Lin, Adaptively connected neural networks, in Proc. of the 2019 IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 1781-1790.
R. Cadne, C. Dancette, H. Ben-younes, M. Cord, and D. Parikh, RUBi: Reducing unimodal biases for visual question answering, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 841-852.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 4171-4186.
N. Xu and W. J. Mao, MultiSentiNet: A deep semantic network for multimodal sentiment analysis, in Proc. 2017 ACM Conf. Information and Knowledge Management, Singapore, 2017, pp. 2399-2402.
N. Xu, W. J. Mao, and G. D. Chen, A co-memory network for multimodal sentiment analysis, in Proc. 41st Int. ACM SIGIR Conf. Research & Development in Information Retrieval, Ann Arbor, MI, USA, 2018, pp. 929-932.
J. C. Xu, D. L. Chen, X. P. Qiu, and X. J. Huang, Cached long short-term memory neural networks for document-level sentiment classification, in Proc. 2016 Conf. Empirical Methods in Natural Language Processing, Austin, TX, USA, 2016, pp. 1660-1669.
A. Mishra, K. Dey, and P. Bhattacharyya, Learning cognitive features from gaze data for sentiment and sarcasm classification using convolutional neural network, in Proc. 55th Annu. Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 377-387.
D. H. Ma, S. J. Li, X. D. Zhang, and H. F. Wang, Interactive attention networks for aspect-level sentiment classification, in Proc. 26th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 4068-4074.
A. Gaspar and L. A. Alexandre, A multimodal approach to image sentiment analysis, in Proc. 20th Int. Conf. Intelligent Data Engineering and Automated Learning, Manchester, UK, 2019, pp. 302-309.
Q. T. Truong and H. W. Lauw, VistaNet: Visual aspect attention network for multimodal sentiment analysis, Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 305-312, 2019.
B. Liu, S. J. Tang, X. J. Sun, Q. Y. Chen, J. X. Cao, J. Z. Luo, and S. S. Zhao, Context-aware social media user sentiment analysis, Tsinghua Science and Technology, vol. 25, no. 4, pp. 528-541, 2020.
E. J. Barezi and P. Fung, Modality-based factorization for multimodal fusion, in Proc. 4th Workshop on Representation Learning for NLP, Florence, Italy, 2019, pp. 260-269.
S. Poria, N. Majumder, D. Hazarika, E. Cambria, A. Gelbukh, and A. Hussain, Multimodal sentiment analysis: Addressing key issues and setting up the baselines, IEEE Intell. Syst., vol. 33, no. 6, pp. 17-25, 2018.
M. H. Chen, S. Wang, P. P. Liang, T. Baltrušaitis, A. Zadeh, and L. P. Morency, Multimodal sentiment analysis with word-level fusion and reinforcement learning, in Proc. 19th ACM Int. Conf. Multimodal Interaction, Glasgow, UK, 2017, pp. 163-171.
N. Majumder, D. Hazarika, A. Gelbukh, E. Cambria, and S. Poria, Multimodal sentiment analysis using hierarchical fusion with context modeling, Knowl.-Based Syst., vol. 161, pp. 124-133, 2018.
E. Cambria, D. Hazarika, S. Poria, A. Hussain, and R. B. V. Subramanyam, Benchmarking multimodal sentiment analysis, in Proc. 18th Int. Conf. Computational Linguistics and Intelligent Text Processing, Budapest, Hungary, 2017, pp. 166-179.
D. Zhang, S. S. Li, Q. M. Zhu, and G. D. Zhou, Multi-modal sentiment classification with independent and interactive knowledge via semi-supervised learning, IEEE Access, vol. 8, pp. 22945-22954, 2020.
Z. L. Wang, Z. H. Wan, and X. J. Wan, TransModality: An End2End fusion method with transformer for multimodal sentiment analysis, in Proc. Web Conf., Taipei, China, 2020, pp. 2514-2520.
C. Yang, X. C. Wang, and B. Jiang, Sentiment enhanced multi-modal Hashtag recommendation for micro-videos, IEEE Access, vol. 8, pp. 78252-78264, 2020.
F. R. Huang, K. M. Wei, J. Weng, and Z. J. Li, Attention-based modality-gated networks for image-text sentiment analysis, ACM Trans. Multimed. Comput. Commun. Appl., vol. 16, no. 3, p. 79, 2020.
K. Xu, J. L. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, Show, attend and tell: Neural image caption generation with visual attention, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 2048-2057.
D. Borth, R. R. Ji, T. Chen, T. Breuel, and S. F. Chang, Large-scale visual sentiment ontology and detectors using adjective noun pairs, in Proc. 21st ACM Int. Conf. Multimedia, Barcelona, Spain, 2013, pp. 223-232.
C. Baecchi, T. Uricchio, M. Bertini, and A. Del Bimbo, A multimodal feature learning approach for sentiment analysis of social network multimedia, Multimed. Tools Appl., vol. 75, no. 5, pp. 2507-2525, 2016.
G. Y. Cai and B. B. Xia, Convolutional neural networks for multimedia sentiment analysis, in Proc. 4th CCF Conf. Natural Language Processing and Chinese Computing, Nanchang, China, 2015, pp. 159-167.
Y. H. Yu, H. F. Lin, J. N. Meng, and Z. H. Zhao, Visual and textual sentiment analysis of a microblog using deep convolutional neural networks, Algorithms, vol. 9, no. 2, p. 41, 2016.
N. Xu, Analyzing multimodal public sentiment based on hierarchical semantic attentional network, in Proc. 2017 IEEE Int. Conf. Intelligence and Security Informatics, Beijing, China, 2017, pp. 152-154.
K. Zhang, Y. S. Geng, J. Zhao, J. X. Liu, and W. X. Li, Sentiment analysis of social media via multimodal feature fusion, Symmetry, vol. 12, no. 12, p. 2010, 2020.
X. C. Yang, S. Feng, D. L. Wang, and Y. F. Zhang, Image-text multimodal emotion classification via multi-view attentional network, IEEE Trans. Multimed., .
N. Vo, J. Lu, S. Chen, K. Murphy, and J. Hays, Composing text and image for image retrieval-an empirical odyssey, in Proc. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 6432-6441.
J. Arevalo, T. Solorio, M. Montes-y-Gómez, and F. A. González, Gated multimodal units for information fusion, in Proc. 5th Int. Conf. Learning Representations, https://arxiv.org/abs/1702.01992v1.
Y. Q. Wang, M. L. Huang, X. Y. Zhu, and L. Zhao, Attention-based LSTM for aspect-level sentiment classification, in Proc. 2016 Conf. Empirical Methods in Natural Language Processing, Austin, TX, USA, 2016, pp. 606-615.
D. Y. Tang, B. Qin, and T. Liu, Aspect level sentiment classification with deep memory network, in Proc. 2016 Conf. Empirical Methods in Natural Language Processing, Austin, TX, USA, 2016, pp. 214-224.
P. Chen, Z. Q. Sun, L. D. Bing, and W. Yang, Recurrent attention network on memory for aspect sentiment analysis, in Proc. 2017 Conf. Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 452-461.
K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.