Brain-inspired multimodal learning based on neural networks

Chang Liu; Fuchun Sun; Bo Zhang

doi:10.26599/BSA.2018.9050004

Brain Science Advances 2018, 4(1): 61-72 https://doi.org/10.26599/BSA.2018.9050004

Research Article |

Open Access | Issue | Published: 25 November 2018

Brain-inspired multimodal learning based on neural networks

Show Author's Information Hide Author's Information Chang Liu, Fuchun Sun(

), Bo Zhang

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Keywords:

deep learning, multimodal learning, brain-inspired learning, neural networks

Cite this article:

Liu C, Sun F, Zhang B. Brain-inspired multimodal learning based on neural networks. Brain Science Advances, 2018, 4(1): 61-72. https://doi.org/10.26599/BSA.2018.9050004

Download citation

EndNote(RIS)

BibTeX

506

Views

Downloads

Citations

Crossref

N/A

WoS

N/A

Scopus

N/A

CSCD

Abstract Full text About this article

Abstract

Modern computational models have leveraged biological advances in human brain research. This study addresses the problem of multimodal learning with the help of brain-inspired models. Specifically, a unified multimodal learning architecture is proposed based on deep neural networks, which are inspired by the biology of the visual cortex of the human brain. This unified framework is validated by two practical multimodal learning tasks: image captioning, involving visual and natural language signals, and visual-haptic fusion, involving haptic and visual signals. Extensive experiments are conducted under the framework, and competitive results are achieved.

Full text

Abstract

Full text

Outline

About this article

Brain-inspired multimodal learning based on neural networks

Show Author's information Hide Author's Information Chang Liu, Fuchun Sun(

), Bo Zhang

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Abstract

Keywords: deep learning, multimodal learning, brain-inspired learning, neural networks

References(28)

[1]

Riesenhuber M, Poggio T. Hierarchical models of object recognition in cortex. Nat Neurosci 1999, 2(11): 1019-1025.

Google Scholar

[2]

Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 1962, 160(1): 106-154.

Google Scholar

[3]

LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE 1998, 86(11): 2278-2324.

Google Scholar

[4]

Goodale MA, Milner AD. Separate visual pathways for perception and action. Trends Neurosci 1992, 15(1): 20-25.

Google Scholar

[5]

Bethopedia. http://wiki.bethanycrane.com/printer--friendly//introducingtheeye.

[6]

Hu XL, Zhang JW, Li JM, Zhang B. Sparsity-regularized HMAX for visual recognition. PLOS One 2014, 9(1): e81813.

Google Scholar

[7]

Dura-Bernal S, Wennekers T, Denham SL. The role of feedback in a hierarchical model of object perception. In From Brains to Systems. Hernández C, Sanz R, Gómez-Ramirez J, Smith LS, Hussain A, Chella A, Aleksander I, Eds. New York, NY: Springer, 2011, pp 165-179.

[8]

Casagrande VA. A third parallel visual pathway to primate area V1. Trends Neurosci 1994, 17(7): 305- 310.

Google Scholar

[9]

Markov NT, Vezoli J, Chameau P, Falchier A, Quilodran R, Huissoud C, Lamy C, Misery P, Giroud P, Ullman S, Barone P, Dehay C, Knoblauch K, Kennedy H. Anatomy of hierarchy: Feedforward and feedback pathways in macaque visual cortex. J Comp Neurol 2014, 522(1): 225-259.

Google Scholar

[10]

Murphy PC, Sillito AM. Corticofugal feedback influences the generation of length tuning in the visual pathway. Nature 1987, 329(6141): 727-729.

Google Scholar

[11]

Casagrande VA. A third parallel visual pathway to primate area V1. Trends Neurosci 1994, 17(7): 305-310.

Google Scholar

[12]

Shotton J, Fitzgibbon A, Cook M, Sharp T, Finocchio M, Moore R, Kipman A, Blake A. Real-time human pose recognition in parts from single depth images. In Proceedings of CVPR 2011, Colorado Springs, CO, USA, 2011, pp 1297-1304.

[13]

McMahan HB, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, Chikkerur S, Liu D, Wattenberg M, Hrafnkelsson AM, Boulos T, Kubica J. Ad click prediction: A view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, 2013, pp 1222-1230.

[14]

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015, 521(7553): 436-444.

Google Scholar

[15]

Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, MA: MIT Press, 2016.

[16]

Lahat D, Adali T, Jutten C. Multimodal data fusion: An overview of methods, challenges, and prospects. Proc IEEE 2015, 103(9): 1449-1477.

Google Scholar

[17]

Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS. Multimodal fusion for multimedia analysis: A survey. Multimed Syst 2010, 16(6): 345-379.

Google Scholar

[18]

McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 1943, 5(4): 115-133.

Google Scholar

[19]

Dai JF, Li Y, He KM, Sun J. R-FCN: Object detection via region-based fully convolutional networks. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016.

[20]

Sutskever I, Vinyals O, Le OV. Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014.

[21]

Hodosh M, Young P, Hockenmaier J. Framing image description as a ranking task: Data, models and evaluation metrics. J Artif Intell Res 2013, 47: 853- 899.

Google Scholar

[22]

Jia X, Gavves E, Fernando B, Tuytelaars T. Guiding the long-short term memory model for image caption generation. In Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015.

[23]

Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: A neural image caption generator. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015.

[24]

Karpathy A, Li FF. Deep visual-semantic alignments for generating image descriptions. In Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 2015.

[25]

Mao JH, Xu W, Yang Y, Wang J, Yuille AL. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.

[25]

Mao JH, Xu W, Yang Y, Wang J, Huang ZH, Yuille A. Deep captioning with multimodal recurrent neural networks (m-RNN). arXiv preprint arXiv:1412.6632, 2015.

[26]

Chu V, McMahon I, Riano L, McDonald CG, He Q, Perez-Tejada JM, Arrigo M, Darrell T, Kuchenbecker KJ. Robotic learning of haptic adjectives through physical interaction. Rob Auton Syst 2015, 63: 279- 292.

Google Scholar

[27]

Gao Y, Hendricks LA, Kuchenbecker KJ, Darrell T. Deep learning for tactile understanding from visual and haptic data. In Proceedings of 2016 IEEE International Conference on Robotics and Automation, Stockholm, Sweden, 2016, pp 536-543.

About this article

Publication history

Rights and permissions

Publication history

Received: 15 July 2018

Revised: 06 August 2018

Accepted: 10 August 2018

Published: 25 November 2018

Issue date: September 2018

Copyright

Rights and permissions

This article is published with open access at journals.sagepub.com/home/BSA

Creative Commons Non Commercial CC BY-NC: This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/ en-us/nam/open-access-at-sage).