Journal Home > Volume 24 , Issue 6

Despite the rapid development of mobile and embedded hardware, directly executing computation-expensive and storage-intensive deep learning algorithms on these devices’ local side remains constrained for sensory data analysis. In this paper, we first summarize the layer compression techniques for the state-of-the-art deep learning model from three categories: weight factorization and pruning, convolution decomposition, and special layer architecture designing. For each category of layer compression techniques, we quantify their storage and computation tunable by layer compression techniques and discuss their practical challenges and possible improvements. Then, we implement Android projects using TensorFlow Mobile to test these 10 compression methods and compare their practical performances in terms of accuracy, parameter size, intermediate feature size, computation, processing latency, and energy consumption. To further discuss their advantages and bottlenecks, we test their performance over four standard recognition tasks on six resource-constrained Android smartphones. Finally, we survey two types of run-time Neural Network (NN) compression techniques which are orthogonal with the layer compression techniques, run-time resource management and cost optimization with special NN architecture, which are orthogonal with the layer compression techniques.


menu
Abstract
Full text
Outline
About this article

Deep Model Compression for Mobile Platforms: A Survey

Show Author's information Kaiming NanSicong LiuJunzhao DuHui Liu*( )
School of Computer Science and Technology, Xidian University, Xi’an 710071, China.
School of Software and Institute of Software Engineering, Xidian University, Xi’an 710071, China.

Abstract

Despite the rapid development of mobile and embedded hardware, directly executing computation-expensive and storage-intensive deep learning algorithms on these devices’ local side remains constrained for sensory data analysis. In this paper, we first summarize the layer compression techniques for the state-of-the-art deep learning model from three categories: weight factorization and pruning, convolution decomposition, and special layer architecture designing. For each category of layer compression techniques, we quantify their storage and computation tunable by layer compression techniques and discuss their practical challenges and possible improvements. Then, we implement Android projects using TensorFlow Mobile to test these 10 compression methods and compare their practical performances in terms of accuracy, parameter size, intermediate feature size, computation, processing latency, and energy consumption. To further discuss their advantages and bottlenecks, we test their performance over four standard recognition tasks on six resource-constrained Android smartphones. Finally, we survey two types of run-time Neural Network (NN) compression techniques which are orthogonal with the layer compression techniques, run-time resource management and cost optimization with special NN architecture, which are orthogonal with the layer compression techniques.

Keywords: deep learning, model compression, run-time resource management, cost optimization

References(44)

[1]
Lindholm E., Nickolls J., Oberman S., and Montrym J., NVIDIA Tesla: A unified graphics and computing architecture, IEEE Micro, vol. 28, no. 2, pp. 39-55, 2008.
[2]
Jouppi N. P., Young C., Patil N., Patterson D., Agrawal G., Bajwa R., Bates S., Bhatia S., Boden N., Borchers A., et al., In-datacenter performance analysis of a tensor processing unit, in Proc. 44th Annu. Int. Symp. Computer Architecture, Toronto, Canada, 2017, pp. 1-12.
[3]
Krizhevsky A., Sutskever I., and Hinton G. E., ImageNet classification with deep convolutional neural networks, in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012.
[4]
Simonyan K. and Zisserman A., Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2015.
[5]
He K. M., Zhang X. Y., Ren S. Q., and Sun J., Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770-778.
[6]
Szegedy C., Vanhoucke V., Ioffe S., Shlens J., and Wojna Z., Rethinking the inception architecture for computer vision, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 2818-2826.
DOI
[7]
Szegedy C., Ioffe S., Vanhoucke V., and Alemi A., Inception-v4, inception-ResNet and the impact of residual connections on learning, arXiv preprint arXiv:1602.07261, 2016.
[8]
Das A., Degeling M., Wang X. Y., Wang J. J., Sadeh N., and Satyanarayanan M., Assisting users in a world full of cameras: A privacy-aware infrastructure for computer vision applications, in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 2017, pp. 1387-1396.
DOI
[9]
Haridas A. V., Marimuthu R., and Sivakumar V. G., A critical review and analysis on techniques of speech recognition: The road ahead, International Journal of Knowledge-Based and Intelligent Engineering Systems, vol. 22, no. 1, pp. 39-57, 2018.
[10]
Li Z. J., Li M., Mohapatra P., Han J. S., and Chen S. Y., iType: Using eye gaze to enhance typing privacy, in Proc. IEEE INFOCOM 2017-IEEE Conf. Computer Communications, Atlanta, GA, USA, 2017, pp. 1-9.
[11]
Jusoh S., A study on NLP applications and ambiguity problems, Journal of Theoretical Applied Information Technology, vol. 96, no. 6, pp. 1486-1499, 2018.
[12]
Chervirala S., Mallya S., and Li W. C., Method and system to recommend applications from an application market place to a new device, US Patent No. 9881050, Jan. 30, 2018.
[13]
Deng J., Dong W., Socher R., Li L. J., Li K., and Li F. F., ImageNet: A large-scale hierarchical image database, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248-255.
DOI
[14]
Jain L. C., Halici U., Hayashi I., Lee S. B., and Tsutsui S., Intelligent Biometric Techniques in Fingerprint and Face Recognition. Boca Raton, FL, USA: CRC Press, 1999.
[15]
Xue J., Li J. Y., and Gong Y. F., Restructuring of deep neural network acoustic models with singular value decomposition, in Proc. INTERSPEECH, Lyon, France, 2013, pp. 2365-2369.
DOI
[16]
Bhattacharya S. and Lane N. D., Sparsification and separation of deep learning layers for constrained resource inference on wearables, in Proc. 14th ACM Conf. Embedded Network Sensor Systems CD-ROM, Stanford, CA, USA, 2016, pp. 176-189.
DOI
[17]
Han S., Man H. Z., and Dally W. J., Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv preprint arXiv: 1510.00149, 2016.
[18]
Liu B. Y., Wang M., Foroosh H., Tappen M., and Pensky M., Sparse convolutional neural networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 806-814.
[19]
Howard A. G., Zhu M. L., Chen B., Kalenichenko D., Wang W. J., Weyand T., Andreetto M., and Adam H., Mobilenets: Efficient convolutional neural networks for mobile vision applications, arXiv preprint arXiv: 1704.04861, 2017.
[20]
Park J., Li S., Wen W., Tang P. T. P., Li H., Chen Y. R., and Dubey P., Faster CNNs with direct sparse convolutions and guided pruning, arXiv preprint arXiv: 1608.01409, 2017.
[21]
Landola F. N., Han S., Moskewicz M. W., Ashraf K., Dally W. J., and Keutzer K., SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size, arXiv preprint arXiv: 1602.07360, 2016.
[22]
Lin M., Chen Q., and Yan S. C., Network in network, arXiv preprint arXiv: 1312.4400, 2014.
[23]
Lane N. D., Bhattacharya S., Georgiev P., Forlivesi C., Lei J., Qendro L., and Kawsar F., DeepX: A software accelerator for low-power deep learning inference on mobile devices, in Proc. 2016 15th ACM/IEEE Int. Conf. Information Processing in Sensor Networks (IPSN), Vienna, Austria, 2016, pp. 1-12.
DOI
[24]
Han S., Pool J., Tran J., and Dally W. J., Learning both weights and connections for efficient neural network, in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015.
[25]
Park J., Li S. R., Wen W., Li H., Chen Y. R., and Dubey P., Holistic SparseCNN: Forging the trident of accuracy, speed, and size, arXiv preprint arXiv: 1608.01409, 2017.
[26]
Spring R. and Shrivastava A., Scalable and sustainable deep learning via randomized hashing, in Proc. 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Halifax, Canada, 2017, pp. 445-454.
DOI
[27]
Mahendran A. and Vedaldi A., Understanding deep image representations by inverting them, in Proc. Conf. Computer Vision and Pattern Recognition, Boston, MA, USA, 2015, pp. 5188-5196.
DOI
[28]
Raghu M., Gilmer J., Yosinski J., and Sohl-Dickstein J., SVCCA: Singular vector canonical correlation analysis for deep learning dynamics and interpretability, in Proc. 31st Annu. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6078-6087.
[29]
Han S., Liu X. Y., Mao H. Z., Pu J., Pedram A., Horowitz M. A., and Dally W. J., EIE: Efficient inference engine on compressed deep neural network, arXiv preprint arXiv: 1602.01528, 2016.
[30]
TensorFlow Lite, https://www.tensorflow.org/lite/guide, 2019
[31]
Changpinyo S., Sandler M., and Zhmoginov A., The power of sparsity in convolutional neural networks, arXiv preprint arXiv: 1702.06257, 2017.
[32]
TensorFlow, https://www.tensorflow.org, 2019.
[33]
LeCun Y., Bottou L., Bengio Y., and Haffner P., Gradient-based learning applied to document recognition, Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[34]
LeCun Y., Cortes C., and Burges C. J. C., The mnist database of handwritten digits, https://goo.gl/t6gTEy, 1998.
[35]
Krizhevsky A., Vinod N., and Geoffrey H., The cifar-10 dataset, https://goo.gl/hXmru5, 2014.
[36]
Liu S. C., Zhou Z. M., Du J. Z., Shangguan L. F., Han J., and Wang X., UbiEar: Bringing location-independent sound awareness to the hard-of-hearing people with smartphones, Proc. ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 1, no. 2, p. 17, 2017.
[37]
Primate labs, Geebench4, https://www.geekbench.com/, 2017.
[38]
Han S., Shen H. C., Philipose M., Agarwal S., Wolman A., and Krishnamurthy A., MCDNN: An approximation-based execution framework for deep stream processing under resource constraints, in Proc. 14th Annu. Int. Conf. Mobile Systems, Applications, and Services, Singapore, 2016, pp. 123-136.
DOI
[39]
Georgiev P., Lane N. D., Rachuri K. K., and Mascolo C., LEO: Scheduling sensor inference algorithms across heterogeneous mobile processors and network resources, in Proc. 22nd Annu. Int. Conf. Mobile Computing and Networking, New York, NY, USA, 2016, pp. 320-333.
[40]
Yao S. C., Zhao Y. R., Zhang A., Su L., and Aldelzaher T., DeepIoT: Compressing deep neural network structures for sensing systems with a compressor-critic framework, in Proc. 15th ACM Conf. Embedded Network Sensor Systems, Delft, Netherlands, 2017, p. 4.
DOI
[41]
Srivastava N., Hinton G., Krizhevsky A., Sutskever I., and Salakhutdinov R., Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
[42]
Konda V. R. and Tsitsiklis J. N., Actor-critic algorithms, in Proc. 13th Int. Conf. Neural Information Processing Systems, Denver, CO, USA, 2000, pp. 1008-1014.
[43]
Teerapittayanon S., McDanel B., and Kung H. T., BranchyNet: Fast inference via early exiting from deep neural networks, in Proc. 23rd Int. Conf. Pattern Recognition, Cancun, Mexico, 2016, pp. 2464-2469.
DOI
[44]
Liu L. L. and Deng J., Dynamic deep neural networks: Optimizing accuracy-efficiency trade-offs by selective execution, arXiv preprint arXiv: 1701.00299, 2018.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 31 January 2018
Revised: 18 May 2018
Accepted: 25 May 2018
Published: 05 December 2019
Issue date: December 2019

Copyright

© The author(s) 2019

Acknowledgements

This work was partially supported by the National Key Research and Development Program of China (No.  2018YFB1003605), Foundations of CARCH (No.  CARCH201704), the National Natural Science Foundation of China (No. 61472312), Foundations of Shaanxi Province and Xi’an Science and Technology Plan (Nos. B018230008 and BD34017020001), and the Foundations of Xidian University (No. JBZ171002).

Rights and permissions

Return