Journal Home > Volume 26 , Issue 3

Resistive Random Access Memory (ReRAM)-based neural network accelerators have potential to surpass their digital counterparts in computational efficiency and performance. However, design of these accelerators faces a number of challenges including imperfections of the ReRAM device and a large amount of calculations required to accurately simulate the former. We present XB-SIM *, a simulation framework for ReRAM-crossbar-based Convolutional Neural Network (CNN) accelerators. XB-SIM * can be flexibly configured to simulate the accelerator’s structure and clock-driven behaviors at the architecture level. This framework also includes an ReRAM-aware Neural Network (NN) training algorithm and a CNN-oriented mapper to train an NN and map it onto the simulated design efficiently. Behavior of the simulator has been verified by the corresponding circuit simulation of a real chip. Furthermore, a batch processing mode of the massive calculations that are required to mimic the behavior of ReRAM-crossbar circuits is proposed to fully apply the computational concurrency of the mapping strategy. On CPU/GPGPU, this batch processing mode can improve the simulation speed by up to 5.02 × or 34.29 ×. Within this framework, comprehensive architectural exploration and end-to-end evaluation have been achieved, which provide some insights for systemic optimization.


menu
Abstract
Full text
Outline
About this article

XB-SIM*: A Simulation Framework for Modeling and Exploration of ReRAM-Based CNN Acceleration Design

Show Author's information Xiang FeiYouhui Zhang( )Weimin Zheng
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Beijing National Research Center for Information Science and Technology, Beijing 100084, China.

Abstract

Resistive Random Access Memory (ReRAM)-based neural network accelerators have potential to surpass their digital counterparts in computational efficiency and performance. However, design of these accelerators faces a number of challenges including imperfections of the ReRAM device and a large amount of calculations required to accurately simulate the former. We present XB-SIM *, a simulation framework for ReRAM-crossbar-based Convolutional Neural Network (CNN) accelerators. XB-SIM * can be flexibly configured to simulate the accelerator’s structure and clock-driven behaviors at the architecture level. This framework also includes an ReRAM-aware Neural Network (NN) training algorithm and a CNN-oriented mapper to train an NN and map it onto the simulated design efficiently. Behavior of the simulator has been verified by the corresponding circuit simulation of a real chip. Furthermore, a batch processing mode of the massive calculations that are required to mimic the behavior of ReRAM-crossbar circuits is proposed to fully apply the computational concurrency of the mapping strategy. On CPU/GPGPU, this batch processing mode can improve the simulation speed by up to 5.02 × or 34.29 ×. Within this framework, comprehensive architectural exploration and end-to-end evaluation have been achieved, which provide some insights for systemic optimization.

Keywords: deep neural network, Resistive Random Access Memory (ReRAM), simulation, accelerator, processing in memory

References(35)

[1]
L. Chua, Memristor-the missing circuit element, IEEE Transactions on Circuit Theory, vol. 18, no. 5, pp. 507-519, 1971.
[2]
M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev, and D. B. Strukov, Training and operation of an integrated neuromorphic network based on metal-oxide memristors, Nature, vol. 521, no. 7550, pp. 61-64, 2015.
[3]
G. Snider, R. Amerson, D. Carter, H. Abdalla, M. S. Qureshi, J. Léveillé, M. Versace, H. Ames, S. Patrick, and B. Chandler, et al., From synapses to circuitry: Using memristive memory to explore the electronic brain, Computer, vol. 44, no. 2, pp. 21-28, 2011.
[4]
B. X. Li, Y. Shan, M. Hu, Y. Wang, Y. R. Chen, and H. Z. Yang, Memristor-based approximated computation, in Proc. 2013 Int. Symp. Low Power Electronics and Design, Beijing, China, 2013, pp. 242-247.
DOI
[5]
B. Y. Liu, M. Hu, H. Li, Z. H. Mao, Y. R. Chen, T. W. Huang, and W. Zhang, Digital-assisted noise-eliminating training for memristor crossbar-based analog neuromorphic computing engine, presented at 2013 50th ACM/EDAC/IEEE Design Automation Conf. (DAC), Austin, TX, USA, 2013, pp. 1-6.
DOI
[6]
M. Hu, J. P. Strachan, Z. Y. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge, J. J. Yang, and R. S. Williams, Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication, presented at 2016 53nd ACM/EDAC/IEEE Design Automation Conf. (DAC), Austin, TX, USA, 2016, pp. 1-6.
DOI
[7]
B. X. Li, L. X. Xia, P. Gu, Y. Wang, and H. Z. Yang, Merging the interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system, presented at 2015 52nd ACM/ EDAC/IEEE Design Automation Conf. (DAC), San Francisco, CA, USA, 2015, pp. 1-6.
DOI
[8]
X. Y. Dong, C. Xu, Y. Xie, and N. P. Jouppi, NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 7, pp. 994-1007, 2012.
[9]
P. Y. Chen, X. C. Peng, and S. M. Yu, NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures, presented at 2017 IEEE Int. Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2017, pp. 6.1.1-6.1.4.
[10]
L. X. Xia, B. X. Li, T. Q. Tang, P. Gu, P. Y. Chen, S. M. Yu, Y. Cao, Y. Wang, Y. Xie, and H. Z. Yang, MNSIM: Simulation platform for memristor-based neuromorphic computing system, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 5, pp. 1009-1022, 2018.
[11]
M. K. F. Lee, Y. N. Cui, T. Somu, T. Luo, J. Zhou, W. T. Tang, W. F. Wong, and R. S. M. Goh, A system-level simulator for RRAM-based neuromorphic computing chips, ACM Transactions on Architecture and Code Optimization (TACO), vol. 15, no. 4, p. 64, 2019.
[12]
S. R. Lee, Y. B. Kim, M. Chang, K. M. Kim, C. B. Lee, J. H. Hur, G. S. Park, D. Lee, M. J. Lee, and C. J. Kim, et al., Multi-level switching of triple-layered TaOx RRAM with excellent reliability for storage class memory, presented at 2012 Symp. VLSI Technology (VLSIT), Honolulu, HI, USA, 2012, pp. 71-72.
DOI
[13]
H. S. P. Wong, H. Y. Lee, S. M. Yu, Y. S. Chen, Y. Wu, P. S. Chen, B. Lee, F. T. Chen, and M. J. Tsai, Metal-oxide RRAM, Proceedings of the IEEE, vol. 100, no. 6, pp. 1951-1970, 2012.
[14]
F. Alibart, L. G. Gao, B. D. Hoskins, and D. B. Strukov, High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm, Nanotechnology, vol. 23, no. 7, p. 075201, 2012.
[15]
D. Ielmini, Modeling the universal set/reset characteristics of bipolar RRAM by field- and temperature-driven filament growth, IEEE Transactions on Electron Devices, vol. 58, no. 12, pp. 4309-4317, 2011.
[16]
P. Chi, S. C. Li, C. Xu, T. Zhang, J. S. Zhao, Y. P. Liu, Y. Wang, and Y. Xie, PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory, in Proc. 43rd Int. Symp. Computer Architecture, Seoul, South Korea, 2016, pp. 27-39.
DOI
[17]
A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars, in Proc. 43rd Annu. Int. Symp. Computer Architecture, Seoul, South Korea, 2016, pp. 14-26.
DOI
[18]
X. X. Liu, M. J. Mao, B. Y. Liu, H. Li, Y. R. Chen, B. X. Li, Y. Wang, H. Jiang, M. Barnell, and Q. Wu, et al., RENO: A high-efficient reconfigurable neuromorphic computing accelerator design, presented at 2015 52nd ACM/EDAC/IEEE Design Automation Conf. (DAC), San Francisco, CA, USA, 2015, pp. 1-6.
DOI
[19]
L. H. Song, X. H. Qian, H. Li, and Y. R. Chen, PipeLayer: A pipelined ReRAM-based accelerator for deep learning, presented at 2017 IEEE Int. Symp. High Performance Computer Architecture (HPCA), Austin, TX, USA, 2017, pp. 541-552.
DOI
[20]
P. Yao, H. Q. Wu, B. Gao, S. B. Eryilmaz, X. Y. Huang, W. Q. Zhang, Q. T. Zhang, N. Deng, L. P. Shi, and H. S. P. Wong, et al., Face classification using electronic synapses, Nature Communications, vol. 8, p.15199, 2017.
[21]
B. Jacob, S. Kligys, B. Chen, M. L. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, Quantization and training of neural networks for efficient integer-arithmetic-only inference, arXiv preprint arXiv:1712.05877, 2017.
DOI
[22]
H. Liu, J. H. Han, and Y. H. Zhang, A unified framework for training, mapping and simulation of ReRAM-based convolutional neural network acceleration, IEEE Computer Architecture Letters, vol. 18, no. 1, pp. 63-66, 2019.
[23]
B. Y. Liu, H. Li, Y. R. Chen, X. Li, T. W. Huang, Q. Wu, and M. Barnell, Reduction and IR-drop compensations techniques for reliable neuromorphic computing systems, presented at 2014 IEEE/ACM Int. Conf. Computer-Aided Design (ICCAD), San Jose, CA, USA, 2014, pp. 63-70.
DOI
[24]
B. Y. Liu, H. Li, Y. R. Chen, X. Li, Q. Wu, and T. W. Huang, Vortex: Variation-aware training for memristor X-bar, presented at 2015 52nd ACM/EDAC/IEEE Design Automation Conf. (DAC), San Francisco, CA, USA, 2015, pp. 1-6.
DOI
[25]
T. Q. Tang, L. X. Xia, B. X. Li, Y. Wang, and H. Z. Yang, Binary convolutional neural network on RRAM, in Proc. 22nd Asia and South Pacific Design Automation Conf. (ASPDAC), Chiba, Japan, 2017, pp. 782-787.
DOI
[26]
S. Xu, Y. Wang, Y. H. Han, and X. W. Li, PIMCH: Cooperative memory prefetching in processing-in-memory architecture, in Proc. 23rd Asia and South Pacific Design Automation Conf., Jeju, Republic of Korea, 2018, pp. 209-214.
DOI
[27]
L. X. Xia, P. Gu, B. X. Li, T. Q. Tang, X. L. Yin, W. Q. Huangfu, S. M. Yu, Y. Cao, Y. Wang, and H. Z. Yang, Technological exploration of RRAM crossbar array for matrix-vector multiplication, Journal of Computer Science and Technology, vol. 31, no. 1, pp. 3-19, 2016.
[28]
P. Gysel, Ristretto: Hardware-oriented approximation of convolutional neural networks, arXiv preprint arXiv: 1605.06402, 2016.
[29]
N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, Cacti 6.0: A tool to model large caches, http://www.cs.utah.edu/∼rajeev/cacti6/cacti6-tr.pdf, 2019.
[30]
B. Murmann, ADC performance survey 1997-2020, http://web.stanford.edu/∼murmann/adcsurvey.html, 2019.
[31]
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.
[32]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[33]
Y. Ji, Y. Y. Zhang, X. F. Xie, S. C. Li, P. Q. Wang, X. Hu, Y. H. Zhang, and Y. Xie, FPSA: A full system stack solution for reconfigurable ReRAM-based NN accelerator architecture, presented at Proc. 24th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Providence, RI, USA, 2019, pp. 733-747.
DOI
[34]
N. Jiang, D. U. Becker, G. Michelogiannakis, J. Balfour, B. Towles, J. Kim, and W. J. Dally, A detailed and flexible cycle-accurate network-on-chip simulator, in Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, USA, 2013.
DOI
[35]
A. B. Kahng, B. Li, L. S. Peh, and K. Samadi, ORION 2.0: A power-area simulator for interconnection networks, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 1, pp. 191-196, 2012.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 09 October 2019
Accepted: 19 November 2019
Published: 12 October 2020
Issue date: June 2021

Copyright

© The author(s) 2021.

Acknowledgements

This work was supported in part by Beijing Academy of Artificial Intelligence (BAAI) (No. BAAI2019ZD0403), Beijing Innovation Center for Future Chip, Tsinghua University, and the Science and Technology Innovation Special Zone Project, China.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return