Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Image-text sentiment analysis task has attracted increasing attention in recent years because of the surge of social media reviews in social networks. Although previous research works have made significant progress with feature fusion between image and text modalities, how to effectively obtain the intra-modality and inter-modality features is still an open research issue in image-text sentiment analysis. To address this problem, we propose a novel method called Modality Adaptation Multi-Broad Learning (MAMBL). Specifically, we take Vision Transformer (ViT) and Robustly optimized Bidirectional Encoder Representation from Transformers approach (RoBERTa) pre-training models to extract image and text features, respectively. Then, we adopt Multi-Layer Perceptron (MLP) unit to learn modality-invariant and modality-specific representations to provide a comprehensive view for understanding image-text data. Furthermore, we introduce two Dual Broad Learning (DBL) to fuse multi-modal features for sentiment classification. Extensive experiments have conducted on three benchmark image-text sentiment analysis datasets, namely MVSA-Single, MVSA-Multiple, and HFM. The experimental results demonstrate that our proposed method can achieve higher performance than the baseline models.
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Comments on this article