AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Approximate Processing Element Design and Analysis for the Implementation of CNN Accelerators

Department of Micro-Nano Electronics, Shanghai Jiao Tong University, Shanghai 200240, China
School of Integrated Circuits, Tsinghua University, Beijing 100084, China
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada
Show Author Information

Abstract

As a primary computation unit, a processing element (PE) is key to the energy efficiency of a convolutional neural network (CNN) accelerator. Taking advantage of the inherent error tolerance of CNNs, approximate computing with high hardware efficiency has been considered for implementing the computation units of CNN accelerators. However, individual approximate designs such as multipliers and adders can only achieve limited accuracy and hardware improvements. In this paper, an approximate PE is dedicatedly devised for CNN accelerators by synergistically considering the data representation, multiplication and accumulation. An approximate data format is defined for the weights using stochastic rounding. This data format enables a simple implementation of multiplication by using small lookup tables, an adder and a shifter. Two approximate accumulators are further proposed for the product accumulation in the PE. Compared with the exact 8-bit fixed-point design, the proposed PE saves more than 29% and 20% in power-delay product for 3 × 3 and 5 × 5 sum of products, respectively. Also, compared with the PEs consisting of state-of-the-art approximate multipliers, the proposed design shows significantly smaller error bias with lower hardware overhead. Moreover, the application of the approximate PEs in CNN accelerators is analyzed by implementing a multi-task CNN for face detection and alignment. We conclude that 1) an approximate PE is more effective for face detection than for alignment, 2) an approximate PE with high statistically-measured accuracy does not necessarily result in good quality in face detection, and 3) properly increasing the number of PEs in a CNN accelerator can improve its power and energy efficiency.

Electronic Supplementary Material

Download File(s)
JCST-2205-12548-Highlights.pdf (1.8 MB)

References

【1】
【1】
 
 
Journal of Computer Science and Technology
Pages 309-327

{{item.num}}

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Close
Close
Cite this article:
Li T, Jiang H-L, Mo H, et al. Approximate Processing Element Design and Analysis for the Implementation of CNN Accelerators. Journal of Computer Science and Technology, 2023, 38(2): 309-327. https://doi.org/10.1007/s11390-023-2548-8

1029

Views

6

Crossref

2

Web of Science

6

Scopus

0

CSCD

Received: 03 June 2022
Accepted: 23 March 2023
Published: 30 March 2023
© Institute of Computing Technology, Chinese Academy of Sciences 2023