523
Views
106
Downloads
0
Crossref
0
WoS
0
Scopus
0
CSCD
Numerous neural network (NN) applications are now being deployed to mobile devices. These applications usually have large amounts of calculation and data while requiring low inference latency, which poses challenges to the computing ability of mobile devices. Moreover, devices’ life and performance depend on temperature. Hence, in many scenarios, such as industrial production and automotive systems, where the environmental temperatures are usually high, it is important to control devices’ temperatures to maintain steady operations. In this paper, we propose a thermal-aware channel-wise heterogeneous NN inference algorithm. It contains two parts, the thermal-aware dynamic frequency (TADF) algorithm and the heterogeneous-processor single-layer workload distribution (HSWD) algorithm. Depending on a mobile device’s architecture characteristics and environmental temperature, TADF can adjust the appropriate running speed of the central processing unit and graphics processing unit, and then the workload of each layer in the NN model is distributed by HSWD in line with each processor’s running speed and the characteristics of the layers as well as heterogeneous processors. The experimental results, where representative NNs and mobile devices were used, show that the proposed method can considerably improve the speed of the on-device inference by 21%–43% over the traditional inference method.
Numerous neural network (NN) applications are now being deployed to mobile devices. These applications usually have large amounts of calculation and data while requiring low inference latency, which poses challenges to the computing ability of mobile devices. Moreover, devices’ life and performance depend on temperature. Hence, in many scenarios, such as industrial production and automotive systems, where the environmental temperatures are usually high, it is important to control devices’ temperatures to maintain steady operations. In this paper, we propose a thermal-aware channel-wise heterogeneous NN inference algorithm. It contains two parts, the thermal-aware dynamic frequency (TADF) algorithm and the heterogeneous-processor single-layer workload distribution (HSWD) algorithm. Depending on a mobile device’s architecture characteristics and environmental temperature, TADF can adjust the appropriate running speed of the central processing unit and graphics processing unit, and then the workload of each layer in the NN model is distributed by HSWD in line with each processor’s running speed and the characteristics of the layers as well as heterogeneous processors. The experimental results, where representative NNs and mobile devices were used, show that the proposed method can considerably improve the speed of the on-device inference by 21%–43% over the traditional inference method.
This work was supported by the National Key R&D Program of China (No. 2018AAA0100500), the National Natural Science Foundation of China (Nos. 61972085, 61872079, and 61632008), the Jiangsu Provincial Key Laboratory of Network and Information Security (No. BM2003201), Key Laboratory of Computer Network and Information Integration of Ministry of Education of China (No. 93K-9), Southeast University China Mobile Research Institute Joint Innovation Center (No. R21701010102018), and the University Synergy Innovation Program of Anhui Province (No. GXXT-2020-012), and partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization, the Fundamental Research Funds for the Central Universities, CCF-Baidu Open Fund (No. 2021PP15002000), and the Future Network Scientific Research Fund Project (No. FNSRFP-2021-YB-02). We also thank the Big Data Computing Center of Southeast University for providing the experiment environment and computing facility.
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).