Deep Model Compression for Mobile Platforms: A Survey

Kaiming Nan; Sicong Liu; Junzhao Du; Hui Liu

doi:10.26599/TST.2018.9010103

| Sign up

PDF (7.9 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

Deep Model Compression for Mobile Platforms: A Survey

Kaiming Nan, Sicong Liu, Junzhao Du, Hui Liu^{^*}()

School of Computer Science and Technology, Xidian University, Xi’an 710071, China.

School of Software and Institute of Software Engineering, Xidian University, Xi’an 710071, China.

Show Author Information

Abstract

Despite the rapid development of mobile and embedded hardware, directly executing computation-expensive and storage-intensive deep learning algorithms on these devices’ local side remains constrained for sensory data analysis. In this paper, we first summarize the layer compression techniques for the state-of-the-art deep learning model from three categories: weight factorization and pruning, convolution decomposition, and special layer architecture designing. For each category of layer compression techniques, we quantify their storage and computation tunable by layer compression techniques and discuss their practical challenges and possible improvements. Then, we implement Android projects using TensorFlow Mobile to test these 10 compression methods and compare their practical performances in terms of accuracy, parameter size, intermediate feature size, computation, processing latency, and energy consumption. To further discuss their advantages and bottlenecks, we test their performance over four standard recognition tasks on six resource-constrained Android smartphones. Finally, we survey two types of run-time Neural Network (NN) compression techniques which are orthogonal with the layer compression techniques, run-time resource management and cost optimization with special NN architecture, which are orthogonal with the layer compression techniques.

Keywords

deep learning model compression run-time resource management cost optimization

References

[1]

Lindholm

, Nickolls

, Oberman

, and Montrym

, NVIDIA Tesla: A unified graphics and computing architecture, IEEE Micro, vol. 28, no. 2, pp. 39-55, 2008.

Crossref Google Scholar

[2]

Jouppi

N. P.

, Young

, Patil

, Patterson

, Agrawal

, Bajwa

, Bates

, Bhatia

, Boden

, Borchers

et al.

, In-datacenter performance analysis of a tensor processing unit, in Proc. 44th Annu. Int. Symp. Computer Architecture, Toronto, Canada, 2017, pp. 1-12.

[3]

Krizhevsky

, Sutskever

, and Hinton

G. E.

, ImageNet classification with deep convolutional neural networks, in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012.

[4]

Simonyan

and Zisserman

, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2015.

Google Scholar

[5]

K. M.

, Zhang

X. Y.

, Ren

S. Q.

, and Sun

, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.

[6]

Szegedy

, Vanhoucke

, Ioffe

, Shlens

, and Wojna

, Rethinking the inception architecture for computer vision, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2818-2826.

Crossref

[7]

Szegedy

, Ioffe

, Vanhoucke

, and Alemi

, Inception-v4, inception-ResNet and the impact of residual connections on learning, arXiv preprint arXiv:1602.07261, 2016.

Google Scholar

[8]

Das

, Degeling

, Wang

X. Y.

, Wang

J. J.

, Sadeh

, and Satyanarayanan

, Assisting users in a world full of cameras: A privacy-aware infrastructure for computer vision applications, in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 2017, pp. 1387-1396.

Crossref

[9]

Haridas

A. V.

, Marimuthu

, and Sivakumar

V. G.

, A critical review and analysis on techniques of speech recognition: The road ahead, International Journal of Knowledge-Based and Intelligent Engineering Systems, vol. 22, no. 1, pp. 39-57, 2018.

Crossref Google Scholar

[10]

Z. J.

, Li

, Mohapatra

, Han

J. S.

, and Chen

S. Y.

, iType: Using eye gaze to enhance typing privacy, in Proc. IEEE INFOCOM 2017-IEEE Conf. Computer Communications, Atlanta, GA, USA, 2017, pp. 1-9.

[11]

Jusoh

, A study on NLP applications and ambiguity problems, Journal of Theoretical Applied Information Technology, vol. 96, no. 6, pp. 1486-1499, 2018.

Google Scholar

[12]

Chervirala

, Mallya

, and Li

W. C.

, Method and system to recommend applications from an application market place to a new device, US Patent No. 9881050, Jan. 30, 2018.

[13]

Deng

, Dong

, Socher

, Li

L. J.

, Li

, and Li

F. F.

, ImageNet: A large-scale hierarchical image database, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248-255.

Crossref

[14]

Jain

L. C.

, Halici

, Hayashi

, Lee

S. B.

, and Tsutsui

, Intelligent Biometric Techniques in Fingerprint and Face Recognition. Boca Raton, FL, USA: CRC Press, 1999.

[15]

Xue

, Li

J. Y.

, and Gong

Y. F.

, Restructuring of deep neural network acoustic models with singular value decomposition, in Proc. INTERSPEECH, Lyon, France, 2013, pp. 2365-2369.

Crossref

[16]

Bhattacharya

and Lane

N. D.

, Sparsification and separation of deep learning layers for constrained resource inference on wearables, in Proc. 14th ACM Conf. Embedded Network Sensor Systems CD-ROM, Stanford, CA, USA, 2016, pp. 176-189.

Crossref

[17]

Han

, Man

H. Z.

, and Dally

W. J.

, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv preprint arXiv: 1510.00149, 2016.

Google Scholar

[18]

Liu

B. Y.

, Wang

, Foroosh

, Tappen

, and Pensky

, Sparse convolutional neural networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 806-814.

[19]

Howard

A. G.

, Zhu

M. L.

, Chen

, Kalenichenko

, Wang

W. J.

, Weyand

, Andreetto

, and Adam

, Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv: 1704.04861, 2017.

Google Scholar

[20]

Park

, Li

, Wen

, Tang

P. T. P.

, Li

, Chen

Y. R.

, and Dubey

, Faster CNNs with direct sparse convolutions and guided pruning, arXiv preprint arXiv: 1608.01409, 2017.

Google Scholar

[21]

Landola

F. N.

, Han

, Moskewicz

M. W.

, Ashraf

, Dally

W. J.

, and Keutzer

, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size, arXiv preprint arXiv: 1602.07360, 2016.

[22]

Lin

, Chen

, and Yan

S. C.

, Network in network, arXiv preprint arXiv: 1312.4400, 2014.

Google Scholar

[23]

Lane

N. D.

, Bhattacharya

, Georgiev

, Forlivesi

, Lei

, Qendro

, and Kawsar

, DeepX: A software accelerator for low-power deep learning inference on mobile devices, in Proc. 2016 15th ACM/IEEE Int. Conf. Information Processing in Sensor Networks (IPSN), Vienna, Austria, 2016, pp. 1-12.

Crossref

[24]

Han

, Pool

, Tran

, and Dally

W. J.

, Learning both weights and connections for efficient neural network, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015.

[25]

Park

, Li

S. R.

, Wen

, Li

, Chen

Y. R.

, and Dubey

, Holistic SparseCNN: Forging the trident of accuracy, speed, and size, arXiv preprint arXiv: 1608.01409, 2017.

Google Scholar

[26]

Spring

and Shrivastava

, Scalable and sustainable deep learning via randomized hashing, in Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 445-454.

Crossref

[27]

Mahendran

and Vedaldi

, Understanding deep image representations by inverting them, in Proc. Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 5188-5196.

Crossref

[28]

Raghu

, Gilmer

, Yosinski

, and Sohl-Dickstein

, SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability, in Proc. 31st Annu. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6078-6087.

[29]

Han

, Liu

X. Y.

, Mao

H. Z.

, Pu

, Pedram

, Horowitz

M. A.

, and Dally

W. J.

, EIE: Efficient inference engine on compressed deep neural network, arXiv preprint arXiv: 1602.01528, 2016.

Google Scholar

[30]

TensorFlow Lite, https://www.tensorflow.org/lite/guide, 2019

[31]

Changpinyo

, Sandler

, and Zhmoginov

, The power of sparsity in convolutional neural networks, arXiv preprint arXiv: 1702.06257, 2017.

Google Scholar

[32]

TensorFlow, https://www.tensorflow.org, 2019.

[33]

LeCun

, Bottou

, Bengio

, and Haffner

, Gradient-based learning applied to document recognition, Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.

Crossref Google Scholar

[34]

LeCun

, Cortes

, and Burges

C. J. C.

, The mnist database of handwritten digits, https://goo.gl/t6gTEy, 1998.

[35]

Krizhevsky

, Vinod

, and Geoffrey

, The cifar-10 dataset, https://goo.gl/hXmru5, 2014.

[36]

Liu

S. C.

, Zhou

Z. M.

, Du

J. Z.

, Shangguan

L. F.

, Han

, and Wang

, UbiEar: Bringing location-independent sound awareness to the hard-of-hearing people with smartphones, Proc. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 1, no. 2, p. 17, 2017.

Crossref Google Scholar

[37]

Primate labs, Geebench4, https://www.geekbench.com/, 2017.

[38]

Han

, Shen

H. C.

, Philipose

, Agarwal

, Wolman

, and Krishnamurthy

, MCDNN: An approximation-based execution framework for deep stream processing under resource constraints, in Proc. 14th Annu. Int. Conf. Mobile Systems, Applications, and Services, Singapore, 2016, pp. 123-136.

Crossref

[39]

Georgiev

, Lane

N. D.

, Rachuri

K. K.

, and Mascolo

, LEO: Scheduling sensor inference algorithms across heterogeneous mobile processors and network resources, in Proc. 22nd Annu. Int. Conf. Mobile Computing and Networking, New York, NY, USA, 2016, pp. 320-333.

[40]

Yao

S. C.

, Zhao

Y. R.

, Zhang

, Su

, and Aldelzaher

, DeepIoT: Compressing deep neural network structures for sensing systems with a compressor-critic framework, in Proc. 15th ACM Conf. Embedded Network Sensor Systems, Delft, Netherlands, 2017, p. 4.

Crossref

[41]

Srivastava

, Hinton

, Krizhevsky

, Sutskever

, and Salakhutdinov

, Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.

Google Scholar

[42]

Konda

V. R.

and Tsitsiklis

J. N.

, Actor-critic algorithms, in Proc. 13th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 2000, pp. 1008-1014.

[43]

Teerapittayanon

, McDanel

, and Kung

H. T.

, BranchyNet: Fast inference via early exiting from deep neural networks, in Proc. 23rd Int. Conf. Pattern Recognition, Cancun, Mexico, 2016, pp. 2464-2469.

Crossref

[44]

Liu

L. L.

and Deng

, Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution, arXiv preprint arXiv: 1701.00299, 2018.

Google Scholar

Tsinghua Science and Technology

Volume 24 Issue 6,
December 2019

Pages 677-693

DOI: 10.26599/TST.2018.9010103

Cite this article:

Nan K, Liu S, Du J, et al. Deep Model Compression for Mobile Platforms: A Survey. Tsinghua Science and Technology, 2019, 24(6): 677-693. https://doi.org/10.26599/TST.2018.9010103