Journal Home > Volume 3 , Issue 2

With the development of IoT and 5G technologies, more and more online resources are presented in trendy multimodal data forms over the Internet. Hence, effectively processing multimodal information is significant to the development of various online applications, including e-learning and digital health, to just name a few. However, most AI-driven systems or models can only handle limited forms of information. In this study, we investigate the correlation between natural language processing (NLP) and pattern recognition, trying to apply the mainstream approaches and models used in the computer vision (CV) to the task of NLP. Based on two different Twitter datasets, we propose a convolutional neural network based model to interpret the content of short text with different goals and application backgrounds. The experiments have demonstrated that our proposed model shows fairly competitive performance compared to the mainstream recurrent neural network based NLP models such as bidirectional long short-term memory (Bi-LSTM) and bidirectional gate recurrent unit (Bi-GRU). Moreover, the experimental results also demonstrate that the proposed model can precisely locate the key information in the given text.


menu
Abstract
Full text
Outline
About this article

From computer vision to short text understanding: Applying similar approaches into different disciplines

Show Author's information Jiayin Lin1( )Geng Sun2Jun Shen2David E. Pritchard3Ping Yu2Tingru Cui4Dongming Xu5Li Li6Ghassan Beydoun7
College of Computer and Cyber Security, Fujian Normal University, Fuzhou 350108, China
School of Computing and Information Technology, University of Wollongong, Wollongong 2500, Australia
Research Lab of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
School of Computing and Information Systems, University of Melbourne, Melbourne 3010, Australia
UQ Business School, University of Queensland, Brisbane 4000, Australia
Faculty of Computer and Information Science, Southwest University, Chongqing 400715, China
School of Information System and Modelling, University of Technology Sydney, Sydney 2007, Australia

Abstract

With the development of IoT and 5G technologies, more and more online resources are presented in trendy multimodal data forms over the Internet. Hence, effectively processing multimodal information is significant to the development of various online applications, including e-learning and digital health, to just name a few. However, most AI-driven systems or models can only handle limited forms of information. In this study, we investigate the correlation between natural language processing (NLP) and pattern recognition, trying to apply the mainstream approaches and models used in the computer vision (CV) to the task of NLP. Based on two different Twitter datasets, we propose a convolutional neural network based model to interpret the content of short text with different goals and application backgrounds. The experiments have demonstrated that our proposed model shows fairly competitive performance compared to the mainstream recurrent neural network based NLP models such as bidirectional long short-term memory (Bi-LSTM) and bidirectional gate recurrent unit (Bi-GRU). Moreover, the experimental results also demonstrate that the proposed model can precisely locate the key information in the given text.

Keywords: natural language processing, deep learning, neural network

References(28)

1
J. Chen, H. Li, W. Wang, W. Ding, G. Y. Huang, and Z. Liu, A multimodal alerting system for online class quality assurance, in Proc. 20th International Conference on Artificial Intelligence in Education, Chicago, IL, USA, 2019, pp. 381–385.
DOI
2
C. Lynch, K. Aryafar, and J. Attenberg, Images don’t lie: Transferring deep visual semantic features to large-scale multimodal learning to rank, in Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 541–548.
DOI
3

B. Zhao, S. Xu, S. Lin, X. Luo, and L. Duan, A new visual navigation system for exploring biomedical open educational resource (OER) videos, Journal of the American Medical Informatics Association, vol. 23, no. e1, pp. e34–e41, 2016.

4
N. O’Mahony, S. Campbell, A. Carvalho, S. Harapanahalli, G. V. Hernandez, L. Krpalkova, D. Riordan, and J. Walsh, Deep learning vs. traditional computer vision, in Proc. 2019 Computer Vision Conference (CVC), Las Vegas, NV, USA, 2019, pp. 128–144.
DOI
5

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.

6

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Region-based convolutional networks for accurate object detection and segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142–158, 2015.

7
R. Girshick, Fast R-CNN, in Proc. IEEE International Conference on Computer Vision, Santiago, Chile, 2015, pp. 1440–1448.
DOI
8
X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, Grid R-CNN, in Proc. 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 7355–7364.
DOI
9
L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, Fully-convolutional siamese networks for object tracking, in Proc. 2016 European Conference on Computer Vision, Amsterdam, the Netherlands, 2016, pp. 850–865.
DOI
10
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.
11
S. Bai, Z. He, Y. Dong, and H. Bai, Multi-hierarchical independent correlation filters for visual tracking, in Proc. 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 2020, pp. 1–6.
DOI
12
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVRP), Las Vegas, NV, USA, 2016, pp. 770–778.
DOI
13
Z. Zhai, D. Q. Nguyen, and K. Verspoor, Comparing CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition, arXiv preprint arXiv: 1808.08450, 2018.
DOI
14

D. Pawade, A. Sakhapara, M. Jain, N. Jain, and K. Gada, Story scrambler-automatic text generation using word level RNN-LSTM, International Journal of Information Technology and Computer Science(IJITCS), vol. 10, no. 6, pp. 44–53, 2018.

15
Y. Kim, Convolutional neural networks for sentence classification, arXiv preprint arXiv: 1408.5882, 2014.
DOI
16
J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu, CNN-RNN: A unified framework for multi-label image classification, in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2285–2294.
DOI
17

Y. Guo, Y. Liu, E. M. Bakker, Y. Guo, and M. S. Lew, CNN-RNN: A large-scale hierarchical image classification framework, Multimedia Tools and Applications, vol. 77, no. 8, pp. 10251–10271, 2018.

18
T. Yao, Y. Pan, Y. Li, and T. Mei, Incorporating copying mechanism in image captioning for learning novel objects, in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 5263–5271.
DOI
19
B. Roark, M. Saraclar, and M. Collins, Discriminative n-gram language modeling, Computer Speech & Language, vol. 21, no. 2, pp. 373–392.
DOI
20

F. J. Valverde-Albacete and C. Peláez-Moreno, 100% classification accuracy considered harmful: The normalized information transfer factor explains the accuracy paradox, PloS ONE, vol. 9, no. 1, p. e84217, 2014.

21
T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, arXiv preprint arXiv: 1301.3781, 2013.
22
J. Pennington, R. Socher, and C. D. Manning, Glove: Global vectors for word representation, in Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1532–1543.
DOI
23

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391–407, 1990.

DOI
24
J. Devlin, M. -W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv: 1810.04805, 2018.
25
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., PyTorch: An imperative style, high-performance deep learning library, in Proc. Advances in Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019, pp. 8026–8037.
26
F. A. Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM, in Proc. 1999 Ninth International Conference on Artificial Neural Networks (ICANN 99), Edinburgh, UK, 1999, pp. 850–855.
DOI
27
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint arXiv: 1406.1078, 2014.
DOI
28
G. Neubig and K. Duh, How much is said in a tweet? A multilingual, information-theoretic perspective, in Proc. 2013 AAAI Spring Symposium: Analyzing Microtext, Palo Alto, CA, USA, 2013, pp. 32–39.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 05 December 2021
Revised: 16 March 2022
Accepted: 06 May 2022
Published: 06 September 2022
Issue date: June 2022

Copyright

© All articles included in the journal are copyrighted to the ITU and TUP.

Acknowledgements

Acknowledgment

This work was supported by the Australian Research Council Discovery Project (No. DP180101051) and Natural Science Foundation of China (No. 61877051).

Rights and permissions

This work is available under the CC BY-NC-ND 3.0 IGO license: https://creativecommons.org/licenses/by-nc-nd/3.0/igo/

Return