Intelligent and Converged Networks 2022, 3(2): 161-172 https://doi.org/10.23919/ICN.2022.0010

Open Access | Issue | Published: 06 September 2022

From computer vision to short text understanding: Applying similar approaches into different disciplines

Show Author's Information Hide Author's Information Jiayin Lin^¹(

), Geng Sun^², Jun Shen^², David E. Pritchard^³, Ping Yu^², Tingru Cui^⁴, Dongming Xu^⁵, Li Li^⁶, Ghassan Beydoun^⁷

1 College of Computer and Cyber Security, Fujian Normal University, Fuzhou 350108, China

2 School of Computing and Information Technology, University of Wollongong, Wollongong 2500, Australia

3 Research Lab of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

4 School of Computing and Information Systems, University of Melbourne, Melbourne 3010, Australia

5 UQ Business School, University of Queensland, Brisbane 4000, Australia

6 Faculty of Computer and Information Science, Southwest University, Chongqing 400715, China

7 School of Information System and Modelling, University of Technology Sydney, Sydney 2007, Australia

Keywords:

natural language processing, deep learning, neural network

Cite this article:

Lin J, Sun G, Shen J, et al. From computer vision to short text understanding: Applying similar approaches into different disciplines. Intelligent and Converged Networks, 2022, 3(2): 161-172. https://doi.org/10.23919/ICN.2022.0010

Download citation

EndNote(RIS)

BibTeX

765

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

N/A

CSCD

Abstract Full text About this article

Abstract

With the development of IoT and 5G technologies, more and more online resources are presented in trendy multimodal data forms over the Internet. Hence, effectively processing multimodal information is significant to the development of various online applications, including e-learning and digital health, to just name a few. However, most AI-driven systems or models can only handle limited forms of information. In this study, we investigate the correlation between natural language processing (NLP) and pattern recognition, trying to apply the mainstream approaches and models used in the computer vision (CV) to the task of NLP. Based on two different Twitter datasets, we propose a convolutional neural network based model to interpret the content of short text with different goals and application backgrounds. The experiments have demonstrated that our proposed model shows fairly competitive performance compared to the mainstream recurrent neural network based NLP models such as bidirectional long short-term memory (Bi-LSTM) and bidirectional gate recurrent unit (Bi-GRU). Moreover, the experimental results also demonstrate that the proposed model can precisely locate the key information in the given text.

Full text

Abstract

Full text

Outline

About this article

From computer vision to short text understanding: Applying similar approaches into different disciplines

Show Author's information Hide Author's Information Jiayin Lin^¹(

), Geng Sun^², Jun Shen^², David E. Pritchard^³, Ping Yu^², Tingru Cui^⁴, Dongming Xu^⁵, Li Li^⁶, Ghassan Beydoun^⁷

1 College of Computer and Cyber Security, Fujian Normal University, Fuzhou 350108, China

2 School of Computing and Information Technology, University of Wollongong, Wollongong 2500, Australia

3 Research Lab of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

4 School of Computing and Information Systems, University of Melbourne, Melbourne 3010, Australia

5 UQ Business School, University of Queensland, Brisbane 4000, Australia

6 Faculty of Computer and Information Science, Southwest University, Chongqing 400715, China

7 School of Information System and Modelling, University of Technology Sydney, Sydney 2007, Australia

Abstract

Keywords: natural language processing, deep learning, neural network

References(28)

J. Chen, H. Li, W. Wang, W. Ding, G. Y. Huang, and Z. Liu, A multimodal alerting system for online class quality assurance, in Proc. 20^th International Conference on Artificial Intelligence in Education, Chicago, IL, USA, 2019, pp. 381–385.

DOI

C. Lynch, K. Aryafar, and J. Attenberg, Images don’t lie: Transferring deep visual semantic features to large-scale multimodal learning to rank, in Proc. 22^nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 541–548.

DOI

B. Zhao, S. Xu, S. Lin, X. Luo, and L. Duan, A new visual navigation system for exploring biomedical open educational resource (OER) videos, Journal of the American Medical Informatics Association, vol. 23, no. e1, pp. e34–e41, 2016.

DOI Google Scholar

N. O’Mahony, S. Campbell, A. Carvalho, S. Harapanahalli, G. V. Hernandez, L. Krpalkova, D. Riordan, and J. Walsh, Deep learning vs. traditional computer vision, in Proc. 2019 Computer Vision Conference (CVC), Las Vegas, NV, USA, 2019, pp. 128–144.

DOI

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Communications of the ACM, vol. 60, no. 6, pp. 84–90, 2017.

DOI Google Scholar

R. Girshick, J. Donahue, T. Darrell, and J. Malik, Region-based convolutional networks for accurate object detection and segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142–158, 2015.

DOI Google Scholar

R. Girshick, Fast R-CNN, in Proc. IEEE International Conference on Computer Vision, Santiago, Chile, 2015, pp. 1440–1448.

DOI

X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, Grid R-CNN, in Proc. 2019 IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019, pp. 7355–7364.

DOI

L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, Fully-convolutional siamese networks for object tracking, in Proc. 2016 European Conference on Computer Vision, Amsterdam, the Netherlands, 2016, pp. 850–865.

DOI

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.

S. Bai, Z. He, Y. Dong, and H. Bai, Multi-hierarchical independent correlation filters for visual tracking, in Proc. 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, 2020, pp. 1–6.

DOI

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVRP), Las Vegas, NV, USA, 2016, pp. 770–778.

DOI

Z. Zhai, D. Q. Nguyen, and K. Verspoor, Comparing CNN and LSTM character-level embeddings in BiLSTM-CRF models for chemical and disease named entity recognition, arXiv preprint arXiv: 1808.08450, 2018.

DOI

D. Pawade, A. Sakhapara, M. Jain, N. Jain, and K. Gada, Story scrambler-automatic text generation using word level RNN-LSTM, International Journal of Information Technology and Computer Science(IJITCS), vol. 10, no. 6, pp. 44–53, 2018.

DOI Google Scholar

Y. Kim, Convolutional neural networks for sentence classification, arXiv preprint arXiv: 1408.5882, 2014.

DOI

J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu, CNN-RNN: A unified framework for multi-label image classification, in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2285–2294.

DOI

Y. Guo, Y. Liu, E. M. Bakker, Y. Guo, and M. S. Lew, CNN-RNN: A large-scale hierarchical image classification framework, Multimedia Tools and Applications, vol. 77, no. 8, pp. 10251–10271, 2018.

DOI Google Scholar

T. Yao, Y. Pan, Y. Li, and T. Mei, Incorporating copying mechanism in image captioning for learning novel objects, in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 5263–5271.

DOI

B. Roark, M. Saraclar, and M. Collins, Discriminative n-gram language modeling, Computer Speech & Language, vol. 21, no. 2, pp. 373–392.

DOI

F. J. Valverde-Albacete and C. Peláez-Moreno, 100% classification accuracy considered harmful: The normalized information transfer factor explains the accuracy paradox, PloS ONE, vol. 9, no. 1, p. e84217, 2014.

DOI Google Scholar

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient estimation of word representations in vector space, arXiv preprint arXiv: 1301.3781, 2013.

J. Pennington, R. Socher, and C. D. Manning, Glove: Global vectors for word representation, in Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1532–1543.

DOI

S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, Indexing by latent semantic analysis, Journal of the American Society for Information Science, vol. 41, no. 6, pp. 391–407, 1990.

DOI

J. Devlin, M. -W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv: 1810.04805, 2018.

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., PyTorch: An imperative style, high-performance deep learning library, in Proc. Advances in Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019, pp. 8026–8037.

F. A. Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM, in Proc. 1999 Ninth International Conference on Artificial Neural Networks (ICANN 99), Edinburgh, UK, 1999, pp. 850–855.

DOI

K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint arXiv: 1406.1078, 2014.

DOI

G. Neubig and K. Duh, How much is said in a tweet? A multilingual, information-theoretic perspective, in Proc. 2013 AAAI Spring Symposium: Analyzing Microtext, Palo Alto, CA, USA, 2013, pp. 32–39.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 05 December 2021

Revised: 16 March 2022

Accepted: 06 May 2022

Published: 06 September 2022

Issue date: June 2022

Copyright

Acknowledgements

Acknowledgment

This work was supported by the Australian Research Council Discovery Project (No. DP180101051) and Natural Science Foundation of China (No. 61877051).

Rights and permissions

This work is available under the CC BY-NC-ND 3.0 IGO license: https://creativecommons.org/licenses/by-nc-nd/3.0/igo/