Journal Home > Volume 26 , Issue 5

Convolutional Neural Networks (CNNs) are widely used in computer vision, natural language processing, and so on, which generally require low power and high efficiency in real applications. Thus, energy efficiency has become a critical indicator of CNN accelerators. Considering that asynchronous circuits have the advantages of low power consumption, high speed, and no clock distribution problems, we design and implement an energy-efficient asynchronous CNN accelerator with a 65 nm Complementary Metal Oxide Semiconductor (CMOS) process. Given the absence of a commercial design tool flow for asynchronous circuits, we develop a novel design flow to implement Click-based asynchronous bundled data circuits efficiently to mask layout with conventional Electronic Design Automation (EDA) tools. We also introduce an adaptive delay matching method and perform accurate static timing analysis for the circuits to ensure correct timing. The accelerator for handwriting recognition network (LeNet-5 model) is implemented. Silicon test results show that the asynchronous accelerator has 30% less power in computing array than the synchronous one and that the energy efficiency of the asynchronous accelerator achieves 1.538 TOPS/W, which is 12% higher than that of the synchronous chip.


menu
Abstract
Full text
Outline
About this article

Design and Tool Flow of a Reconfigurable Asynchronous Neural Network Accelerator

Show Author's information Jilin ZhangHui WuWeijia ChenShaojun WeiHong Chen( )
Institute of Microelectronics, Tsinghua National Laboratory for Information Science and Technology, and Beijing Engineering Center of Technology and research on Wireless Medical and Health System, Tsinghua University, Beijing 100084, China

Abstract

Convolutional Neural Networks (CNNs) are widely used in computer vision, natural language processing, and so on, which generally require low power and high efficiency in real applications. Thus, energy efficiency has become a critical indicator of CNN accelerators. Considering that asynchronous circuits have the advantages of low power consumption, high speed, and no clock distribution problems, we design and implement an energy-efficient asynchronous CNN accelerator with a 65 nm Complementary Metal Oxide Semiconductor (CMOS) process. Given the absence of a commercial design tool flow for asynchronous circuits, we develop a novel design flow to implement Click-based asynchronous bundled data circuits efficiently to mask layout with conventional Electronic Design Automation (EDA) tools. We also introduce an adaptive delay matching method and perform accurate static timing analysis for the circuits to ensure correct timing. The accelerator for handwriting recognition network (LeNet-5 model) is implemented. Silicon test results show that the asynchronous accelerator has 30% less power in computing array than the synchronous one and that the energy efficiency of the asynchronous accelerator achieves 1.538 TOPS/W, which is 12% higher than that of the synchronous chip.

Keywords: energy efficiency, Convolutional Neural Network (CNN) accelerator, asynchronous circuit, adaptive delay matching, asynchronous design flow

References(17)

[1]
S. X. Zheng, P. Ouyang, D. D. Song, L. D. Liu, S. J. Wei and S. Y. Yin, An ultra-Low power binarized convolutional neural network-based speech recognition processor with on-chip self-learning, IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 12, pp. 4648–4661, 2019.
[2]
S. Schneider, A. Baevski, R. Collobert, and M. Auli, wav2vec: Unsupervised pre-training for speech recognition, arXiv preprint arXiv: 1904.05862, 2019.
[3]
A. Howard, M. Sandler, G. Chu, L. C. Chen, B. Chen, M. X. Tan, W. J. Wang, Y. K. Zhu, R. M. Pang, V. Vasudevan, et al., Searching for MobileNetV3, arXiv preprint arXiv: 1905.02244, 2019.
[4]
Y. H. Chen, T. J. Yang, J. Emer, and V. Sze, Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices, IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292–308, 2019.
[5]
J. Song, Y. Cho, J. S. Park, J. W. Jang, S. Lee, J. H. Song, J. G. Lee, I. Kang, An 11.5 TOPS/W 1024-MAC butterfly structure dual-core sparsity-aware neural processing unit in 8 nm flagship mobile SoC, in Proc. IEEE Int. Solid-State Circuits Conf., San Francisco, CA, USA, 2019, pp. 130–132.
[6]
J. Lee, J. Lee, D. Han, J. Lee, G. Park, and H. J. Yoo, LNPU: A 25.3 TFLOPS/W sparse deep-neural-network learning processor with fine-grained mixed precision of FP8-FP16, in Proc. IEEE Int. Solid-State Circuits Conf., San Francisco, CA, USA, 2019, pp. 142–144.
[7]
J. Sparsø and S. Furber, Principles of Asynchronous Circuit Design: A Systems Perspective. Boston, MA, USA: Springer, 2001, pp. 3–11.
[8]
P. A. Beerel, R. O. Ozdag, and M. Ferretti, A Designer’s Guide to Asynchronous VLSI. Cambridge, UK: Cambridge University Press, 2010, pp. 7–9.
[9]
H. van Gageldonk, K. van Berkel, A. Peeters, D. Baumann, D. Gloor, and G. Stegmann, An asynchronous low-power 80C51 microcontroller, in Proc. 4th Int. Symp. Advanced Research in Asynchronous Circuits and Systems, San Diego, CA, USA, 1998, pp. 96–107.
[10]
P. A. Beerel and M. E. Roncken, Low power and energy efficient asynchronous design, Journal of Low Power Electronics, vol. 3, no. 3, pp. 234–253, 2007.
[11]
I. E. Sutherland, Micropipelines, Communications of the ACM, vol. 32, no. 6, pp. 720–738, 1989.
[12]
A. Steininger, V. S. Veeravalli, D. Alexandrescu, E. Costenaro, and L. Anghel, Exploring the state dependent SET sensitivity of asynchronous logic – The muller-pipeline example, in Proc. 32nd Int. Conf. Computer Design (ICCD), Seoul, South Korea, 2014, pp. 61–67.
[13]
F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G. J. Nam, et al., TrueNorth: Design and tool flow of a 65mW 1 million neuron programmable neurosynaptic chip, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 34, no. 10, pp. 1537–1557, 2015.
[14]
M. Davies, N. Srinivasa, T. H. Lin, G. Chinya, Y. Q. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, et al., Loihi: A neuromorphic manycore processor with on-chip learning, IEEE Micro, vol. 38, no. 1, pp. 82–99, 2018.
[15]
W. J. Chen, H. Wu, S. J. Wei, A. P. He, and H. Chen, An asynchronous energy-efficient CNN accelerator with reconfigurable architecture, in Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC), Tainan, China, 2018, pp. 51–54.
[16]
A. Peeters, F. te Beest, M. de Wit, and W. Mallon, Click elements: An implementation style for data-driven compilation, in Proc. IEEE Symp. Asynchronous Circuits and Systems, Grenoble, France, 2010, pp. 3–14.
[17]
D. E. Muller, Theory of asynchronous circuits, http://hdl.handle.net/2027/uiuo.ark:/13960/t7pp0n320.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 31 August 2020
Accepted: 09 October 2020
Published: 20 April 2021
Issue date: October 2021

Copyright

© The author(s) 2021

Acknowledgements

This work was supported by National Science and Technology Major Project from Minister of Science and Technology, China (No. 2018AAA0103100) and the National Natural Science Foundation of China (No. 61674090), partly supported by Beijing National Research Center for Information Science and Technology (No. 042003266), and Beijing Engineering Research Center (No. BG0149).

Rights and permissions

© The author(s) 2021. The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return