Thermal-Aware on-Device Inference Using Single-Layer Parallelization with Heterogeneous Processors

Jinghui Zhang; Yuchen Wang; Tianyu Huang; Fang Dong; Wei Zhao; Dian Shen

doi:10.26599/TST.2021.9010075

Tsinghua Science and Technology 2023, 28(1): 82-92 https://doi.org/10.26599/TST.2021.9010075

Open Access | Issue | Published: 21 July 2022

Thermal-Aware on-Device Inference Using Single-Layer Parallelization with Heterogeneous Processors

Show Author's Information Hide Author's Information Jinghui Zhang^¹(

), Yuchen Wang^¹, Tianyu Huang^¹, Fang Dong^¹(

), Wei Zhao^{²^,⁴}, Dian Shen^{¹^,³}

1 School of Computer Science and Engineering, Southeast University, Nanjing 211189, China

2 Hefei Comprehensive National Science Center, Hefei 231299, China

3 Department of Computer Science & Engineering, The Chinese University of Hong Kong, Hong Kong 999077, China

4 School of Computer Science and Technology, Anhui University of Technology, Hefei 230026, China

Keywords:

mobile device, neural network inference, temperature adjustment, channel-wise parallelization

Cite this article:

Zhang J, Wang Y, Huang T, et al. Thermal-Aware on-Device Inference Using Single-Layer Parallelization with Heterogeneous Processors. Tsinghua Science and Technology, 2023, 28(1): 82-92. https://doi.org/10.26599/TST.2021.9010075

Download citation

EndNote(RIS)

BibTeX

523

Views

106

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Numerous neural network (NN) applications are now being deployed to mobile devices. These applications usually have large amounts of calculation and data while requiring low inference latency, which poses challenges to the computing ability of mobile devices. Moreover, devices’ life and performance depend on temperature. Hence, in many scenarios, such as industrial production and automotive systems, where the environmental temperatures are usually high, it is important to control devices’ temperatures to maintain steady operations. In this paper, we propose a thermal-aware channel-wise heterogeneous NN inference algorithm. It contains two parts, the thermal-aware dynamic frequency (TADF) algorithm and the heterogeneous-processor single-layer workload distribution (HSWD) algorithm. Depending on a mobile device’s architecture characteristics and environmental temperature, TADF can adjust the appropriate running speed of the central processing unit and graphics processing unit, and then the workload of each layer in the NN model is distributed by HSWD in line with each processor’s running speed and the characteristics of the layers as well as heterogeneous processors. The experimental results, where representative NNs and mobile devices were used, show that the proposed method can considerably improve the speed of the on-device inference by 21%–43% over the traditional inference method.

Full text

Abstract

Full text

Outline

About this article

Thermal-Aware on-Device Inference Using Single-Layer Parallelization with Heterogeneous Processors

Show Author's information Hide Author's Information Jinghui Zhang^¹(

), Yuchen Wang^¹, Tianyu Huang^¹, Fang Dong^¹(

), Wei Zhao^{²^,⁴}, Dian Shen^{¹^,³}

1 School of Computer Science and Engineering, Southeast University, Nanjing 211189, China

2 Hefei Comprehensive National Science Center, Hefei 231299, China

3 Department of Computer Science & Engineering, The Chinese University of Hong Kong, Hong Kong 999077, China

4 School of Computer Science and Technology, Anhui University of Technology, Hefei 230026, China

Abstract

Keywords: mobile device, neural network inference, temperature adjustment, channel-wise parallelization

References(26)

[1]

R. Lippmann, An introduction to computing with neural nets, IEEE ASSP Mag., vol. 4, no. 2, pp. 4–22, 1987.

DOI Google Scholar

[2]

A. B. Nassif, I. Shahin, I. Attili, M. Azzeh, and K. Shaalan, Speech recognition using deep neural networks: A systematic review, IEEE Access, vol. 7, pp. 19143–19165, 2019.

DOI Google Scholar

[3]

L. Coheur, From Eliza to Siri and beyond, in Proc. 18^th Int. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems, Lisbon, Portugal, 2020, pp. 29–41.

DOI Google Scholar

[4]

C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, Going deeper with convolutions, presented at the 2015 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 1–9.

DOI

[5]

L. K. Zeng, E. Li, Z. Zhou, and X. Chen, Boomerang: On-demand cooperative deep neural network inference for edge intelligence on the industrial internet of things, IEEE Network, vol. 33, no. 5, pp. 96–103, 2019.

DOI Google Scholar

[6]

R. Bi, R. Liu, J. K. Ren, and G. Z. Tan, Utility aware offloading for mobile-edge computing, Tsinghua Science and Technology, vol. 26, no. 2, pp. 239–250, 2021.

DOI Google Scholar

[7]

Q. C. Cao, W. L. Zhang, and Y. H. Zhu, Deep learning-based classification of the polar emotions of “moe”-style cartoon pictures, Tsinghua Science and Technology, vol. 26, no. 3, pp. 275–286, 2021.

DOI Google Scholar

[8]

A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Proc. 25^th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012, pp. 1097–1105.

Google Scholar

[9]

J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, et al., Large scale distributed deep networks, in Proc. 25^th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2012, pp. 1223–1231.

Google Scholar

[10]

S. Han, H. C. Shen, M. Philipose, S. Agarwal, A. Wolman, and A. Krishnamurthy, MCDNN: An approximation-based execution framework for deep stream processing under resource constraints, in Proc. 14^th Annu. Int. Conf. Mobile Systems, Applications, and Services, Singapore, 2016, pp. 123–136.

DOI Google Scholar

[11]

N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, L. Qendro, and F. Kawsar, DeepX: A software accelerator for low-power deep learning inference on mobile devices, presented at the 2016 15^th ACM/IEEE Int. Conf. Information Processing in Sensor Networks (IPSN), Vienna, Austria, 2016, pp. 1–12.

DOI Google Scholar

[12]

Y. Kim, J. Kim, D. Chae, D. Kim, and J. Kim, μLayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization, in Proc. 14^th EuroSys Conf. 2019, Dresden, Germany, pp. 1–15.

Google Scholar

[13]

F. Zhang, J. D. Zhai, B. Wu, B. S. He, W. G. Chen, and X. Y. Du, Automatic irregularity-aware fine-grained workload partitioning on integrated architectures, IEEE Trans. Knowl. Data Eng., vol. 33, no. 3, pp. 867–881, 2021.

Google Scholar

[14]

C. Wang, Y. Y. Yang, and P. Z. Zhou, Towards efficient scheduling of federated mobile devices under computational and statistical heterogeneity, IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 2, pp. 394–410, 2021.

DOI Google Scholar

[15]

Q. S. Zeng, Y. Q. Du, K. B. Huang, and K. K. Leung, Energy-efficient resource management for federated edge learning with CPU-GPU heterogeneous computing, IEEE Trans. Wirel. Commun., .

DOI Google Scholar

[16]

Y. Lee, H. S. Chwa, K. G. Shin, and S. G. Wang, Thermal-aware resource management for embedded real-time systems, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 37, no. 11, pp. 2857–2868, 2018.

DOI Google Scholar

[17]

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation, vol. 1, no. 4, pp. 541–551, 1989.

DOI Google Scholar

[18]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, presented at the 3rd Int. Conf. Learning Representations, San Diego, CA, USA, 2015, pp. 1–14.

[19]

J. Redmon and A. Farhadi, YOLO9000: Better, faster, stronger, presented at the 2017 IEEE Conf. Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6517–6525.

DOI

[20]

C. Liu, J. Li, W. Huang, J. Rubio, E. Speight, and F. X. Lin, Power-efficient time-sensitive mapping in heterogeneous systems, presented at the 2012 21^st Int. Conf. Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, USA, 2012, pp. 23–32.

DOI Google Scholar

[21]

Y. P. Liu, R. P. Dick, L. Shang, and H. Z. Yang, Accurate temperature-dependent integrated circuit leakage power estimation is easy, presented at the 2007 Design, Automation and Test in Europe Conference and Exposition, Nice, France, 2007, pp. 1526–1531.

DOI

[22]

F. Zhang, J. D. Zhai, B. S. He, S. H. Zhang, and W. G. Chen, Understanding co-running behaviors on integrated CPU/GPU architectures, IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 3, pp. 905–918, 2017.

DOI Google Scholar

[23]

A. F. Agarap, Deep learning using rectified linear units (ReLU), arXiv preprint arXiv: 1803.08375, 2019.

Google Scholar

[24]

S. Cass, Nvidia makes it easy to embed AI: The Jetson nano packs a lot of machine-learning power into DIY projects-[Hands on], IEEE Spectrum, vol. 57, no. 7, pp. 14–16, 2020.

DOI Google Scholar

[25]

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. M. Lin, N. Gimelshein, L. Antiga, et al., PyTorch: An imperative style, high-performance deep learning library, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, pp. 8026–8037.

Google Scholar

[26]

K. Colin, Stress-ng, http://kernel.ubuntu.com/git/cking/stressng.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 01 October 2021

Accepted: 13 October 2021

Published: 21 July 2022

Issue date: February 2023

Copyright

Acknowledgements

This work was supported by the National Key R&D Program of China (No. 2018AAA0100500), the National Natural Science Foundation of China (Nos. 61972085, 61872079, and 61632008), the Jiangsu Provincial Key Laboratory of Network and Information Security (No. BM2003201), Key Laboratory of Computer Network and Information Integration of Ministry of Education of China (No. 93K-9), Southeast University China Mobile Research Institute Joint Innovation Center (No. R21701010102018), and the University Synergy Innovation Program of Anhui Province (No. GXXT-2020-012), and partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization, the Fundamental Research Funds for the Central Universities, CCF-Baidu Open Fund (No. 2021PP15002000), and the Future Network Scientific Research Fund Project (No. FNSRFP-2021-YB-02). We also thank the Big Data Computing Center of Southeast University for providing the experiment environment and computing facility.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).