XB-SIM*: A Simulation Framework for Modeling and Exploration of ReRAM-Based CNN Acceleration Design

Xiang Fei; Youhui Zhang; Weimin Zheng

doi:10.26599/TST.2019.9010070

Tsinghua Science and Technology 2021, 26(3): 322-334 https://doi.org/10.26599/TST.2019.9010070

Open Access | Issue | Published: 12 October 2020

XB-SIM*: A Simulation Framework for Modeling and Exploration of ReRAM-Based CNN Acceleration Design

Show Author's Information Hide Author's Information Xiang Fei, Youhui Zhang(

), Weimin Zheng

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Beijing National Research Center for Information Science and Technology, Beijing 100084, China.

Keywords:

deep neural network, Resistive Random Access Memory (ReRAM), simulation, accelerator, processing in memory

Cite this article:

Fei X, Zhang Y, Zheng W. XB-SIM*: A Simulation Framework for Modeling and Exploration of ReRAM-Based CNN Acceleration Design. Tsinghua Science and Technology, 2021, 26(3): 322-334. https://doi.org/10.26599/TST.2019.9010070

Download citation

EndNote(RIS)

BibTeX

992

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Resistive Random Access Memory (ReRAM)-based neural network accelerators have potential to surpass their digital counterparts in computational efficiency and performance. However, design of these accelerators faces a number of challenges including imperfections of the ReRAM device and a large amount of calculations required to accurately simulate the former. We present XB-SIM $^{*}$ , a simulation framework for ReRAM-crossbar-based Convolutional Neural Network (CNN) accelerators. XB-SIM $^{*}$ can be flexibly configured to simulate the accelerator’s structure and clock-driven behaviors at the architecture level. This framework also includes an ReRAM-aware Neural Network (NN) training algorithm and a CNN-oriented mapper to train an NN and map it onto the simulated design efficiently. Behavior of the simulator has been verified by the corresponding circuit simulation of a real chip. Furthermore, a batch processing mode of the massive calculations that are required to mimic the behavior of ReRAM-crossbar circuits is proposed to fully apply the computational concurrency of the mapping strategy. On CPU/GPGPU, this batch processing mode can improve the simulation speed by up to 5.02 $\times$ or 34.29 $\times$ . Within this framework, comprehensive architectural exploration and end-to-end evaluation have been achieved, which provide some insights for systemic optimization.

Full text

Abstract

Full text

Outline

About this article

XB-SIM*: A Simulation Framework for Modeling and Exploration of ReRAM-Based CNN Acceleration Design

Show Author's information Hide Author's Information Xiang Fei, Youhui Zhang(

), Weimin Zheng

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Beijing National Research Center for Information Science and Technology, Beijing 100084, China.

Abstract

Keywords: deep neural network, Resistive Random Access Memory (ReRAM), simulation, accelerator, processing in memory

References(35)

[1]

L. Chua, Memristor-the missing circuit element, IEEE Transactions on Circuit Theory, vol. 18, no. 5, pp. 507-519, 1971.

DOI Google Scholar

[2]

M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev, and D. B. Strukov, Training and operation of an integrated neuromorphic network based on metal-oxide memristors, Nature, vol. 521, no. 7550, pp. 61-64, 2015.

DOI Google Scholar

[3]

G. Snider, R. Amerson, D. Carter, H. Abdalla, M. S. Qureshi, J. Léveillé, M. Versace, H. Ames, S. Patrick, and B. Chandler, et al., From synapses to circuitry: Using memristive memory to explore the electronic brain, Computer, vol. 44, no. 2, pp. 21-28, 2011.

DOI Google Scholar

[4]

B. X. Li, Y. Shan, M. Hu, Y. Wang, Y. R. Chen, and H. Z. Yang, Memristor-based approximated computation, in Proc. 2013 Int. Symp. Low Power Electronics and Design, Beijing, China, 2013, pp. 242-247.

DOI

[5]

B. Y. Liu, M. Hu, H. Li, Z. H. Mao, Y. R. Chen, T. W. Huang, and W. Zhang, Digital-assisted noise-eliminating training for memristor crossbar-based analog neuromorphic computing engine, presented at 2013 50th ACM/EDAC/IEEE Design Automation Conf. (DAC), Austin, TX, USA, 2013, pp. 1-6.

DOI

[6]

M. Hu, J. P. Strachan, Z. Y. Li, E. M. Grafals, N. Davila, C. Graves, S. Lam, N. Ge, J. J. Yang, and R. S. Williams, Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication, presented at 2016 53nd ACM/EDAC/IEEE Design Automation Conf. (DAC), Austin, TX, USA, 2016, pp. 1-6.

DOI

[7]

B. X. Li, L. X. Xia, P. Gu, Y. Wang, and H. Z. Yang, Merging the interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system, presented at 2015 52nd ACM/ EDAC/IEEE Design Automation Conf. (DAC), San Francisco, CA, USA, 2015, pp. 1-6.

DOI

[8]

X. Y. Dong, C. Xu, Y. Xie, and N. P. Jouppi, NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 7, pp. 994-1007, 2012.

DOI Google Scholar

[9]

P. Y. Chen, X. C. Peng, and S. M. Yu, NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures, presented at 2017 IEEE Int. Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2017, pp. 6.1.1-6.1.4.

[10]

L. X. Xia, B. X. Li, T. Q. Tang, P. Gu, P. Y. Chen, S. M. Yu, Y. Cao, Y. Wang, Y. Xie, and H. Z. Yang, MNSIM: Simulation platform for memristor-based neuromorphic computing system, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 5, pp. 1009-1022, 2018.

DOI Google Scholar

[11]

M. K. F. Lee, Y. N. Cui, T. Somu, T. Luo, J. Zhou, W. T. Tang, W. F. Wong, and R. S. M. Goh, A system-level simulator for RRAM-based neuromorphic computing chips, ACM Transactions on Architecture and Code Optimization (TACO), vol. 15, no. 4, p. 64, 2019.

DOI Google Scholar

[12]

S. R. Lee, Y. B. Kim, M. Chang, K. M. Kim, C. B. Lee, J. H. Hur, G. S. Park, D. Lee, M. J. Lee, and C. J. Kim, et al., Multi-level switching of triple-layered TaOx RRAM with excellent reliability for storage class memory, presented at 2012 Symp. VLSI Technology (VLSIT), Honolulu, HI, USA, 2012, pp. 71-72.

DOI

[13]

H. S. P. Wong, H. Y. Lee, S. M. Yu, Y. S. Chen, Y. Wu, P. S. Chen, B. Lee, F. T. Chen, and M. J. Tsai, Metal-oxide RRAM, Proceedings of the IEEE, vol. 100, no. 6, pp. 1951-1970, 2012.

DOI Google Scholar

[14]

F. Alibart, L. G. Gao, B. D. Hoskins, and D. B. Strukov, High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm, Nanotechnology, vol. 23, no. 7, p. 075201, 2012.

DOI Google Scholar

[15]

D. Ielmini, Modeling the universal set/reset characteristics of bipolar RRAM by field- and temperature-driven filament growth, IEEE Transactions on Electron Devices, vol. 58, no. 12, pp. 4309-4317, 2011.

DOI Google Scholar

[16]

P. Chi, S. C. Li, C. Xu, T. Zhang, J. S. Zhao, Y. P. Liu, Y. Wang, and Y. Xie, PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory, in Proc. 43rd Int. Symp. Computer Architecture, Seoul, South Korea, 2016, pp. 27-39.

DOI

[17]

A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars, in Proc. 43rd Annu. Int. Symp. Computer Architecture, Seoul, South Korea, 2016, pp. 14-26.

DOI

[18]

X. X. Liu, M. J. Mao, B. Y. Liu, H. Li, Y. R. Chen, B. X. Li, Y. Wang, H. Jiang, M. Barnell, and Q. Wu, et al., RENO: A high-efficient reconfigurable neuromorphic computing accelerator design, presented at 2015 52nd ACM/EDAC/IEEE Design Automation Conf. (DAC), San Francisco, CA, USA, 2015, pp. 1-6.

DOI

[19]

L. H. Song, X. H. Qian, H. Li, and Y. R. Chen, PipeLayer: A pipelined ReRAM-based accelerator for deep learning, presented at 2017 IEEE Int. Symp. High Performance Computer Architecture (HPCA), Austin, TX, USA, 2017, pp. 541-552.

DOI

[20]

P. Yao, H. Q. Wu, B. Gao, S. B. Eryilmaz, X. Y. Huang, W. Q. Zhang, Q. T. Zhang, N. Deng, L. P. Shi, and H. S. P. Wong, et al., Face classification using electronic synapses, Nature Communications, vol. 8, p.15199, 2017.

DOI Google Scholar

[21]

B. Jacob, S. Kligys, B. Chen, M. L. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, Quantization and training of neural networks for efficient integer-arithmetic-only inference, arXiv preprint arXiv:1712.05877, 2017.

DOI

[22]

H. Liu, J. H. Han, and Y. H. Zhang, A unified framework for training, mapping and simulation of ReRAM-based convolutional neural network acceleration, IEEE Computer Architecture Letters, vol. 18, no. 1, pp. 63-66, 2019.

DOI Google Scholar

[23]

B. Y. Liu, H. Li, Y. R. Chen, X. Li, T. W. Huang, Q. Wu, and M. Barnell, Reduction and IR-drop compensations techniques for reliable neuromorphic computing systems, presented at 2014 IEEE/ACM Int. Conf. Computer-Aided Design (ICCAD), San Jose, CA, USA, 2014, pp. 63-70.

DOI

[24]

B. Y. Liu, H. Li, Y. R. Chen, X. Li, Q. Wu, and T. W. Huang, Vortex: Variation-aware training for memristor X-bar, presented at 2015 52nd ACM/EDAC/IEEE Design Automation Conf. (DAC), San Francisco, CA, USA, 2015, pp. 1-6.

DOI

[25]

T. Q. Tang, L. X. Xia, B. X. Li, Y. Wang, and H. Z. Yang, Binary convolutional neural network on RRAM, in Proc. 22nd Asia and South Pacific Design Automation Conf. (ASPDAC), Chiba, Japan, 2017, pp. 782-787.

DOI

[26]

S. Xu, Y. Wang, Y. H. Han, and X. W. Li, PIMCH: Cooperative memory prefetching in processing-in-memory architecture, in Proc. 23rd Asia and South Pacific Design Automation Conf., Jeju, Republic of Korea, 2018, pp. 209-214.

DOI

[27]

L. X. Xia, P. Gu, B. X. Li, T. Q. Tang, X. L. Yin, W. Q. Huangfu, S. M. Yu, Y. Cao, Y. Wang, and H. Z. Yang, Technological exploration of RRAM crossbar array for matrix-vector multiplication, Journal of Computer Science and Technology, vol. 31, no. 1, pp. 3-19, 2016.

DOI Google Scholar

[28]

P. Gysel, Ristretto: Hardware-oriented approximation of convolutional neural networks, arXiv preprint arXiv: 1605.06402, 2016.

[29]

N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi, Cacti 6.0: A tool to model large caches, http://www.cs.utah.edu/∼rajeev/cacti6/cacti6-tr.pdf, 2019.

[30]

B. Murmann, ADC performance survey 1997-2020, http://web.stanford.edu/∼murmann/adcsurvey.html, 2019.

[31]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556, 2014.

[32]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.

DOI Google Scholar

[33]

Y. Ji, Y. Y. Zhang, X. F. Xie, S. C. Li, P. Q. Wang, X. Hu, Y. H. Zhang, and Y. Xie, FPSA: A full system stack solution for reconfigurable ReRAM-based NN accelerator architecture, presented at Proc. 24th Int. Conf. Architectural Support for Programming Languages and Operating Systems, Providence, RI, USA, 2019, pp. 733-747.

DOI

[34]

N. Jiang, D. U. Becker, G. Michelogiannakis, J. Balfour, B. Towles, J. Kim, and W. J. Dally, A detailed and flexible cycle-accurate network-on-chip simulator, in Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software, Austin, TX, USA, 2013.

DOI

[35]

A. B. Kahng, B. Li, L. S. Peh, and K. Samadi, ORION 2.0: A power-area simulator for interconnection networks, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 1, pp. 191-196, 2012.

DOI Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 09 October 2019

Accepted: 19 November 2019

Published: 12 October 2020

Issue date: June 2021

Copyright

Acknowledgements

This work was supported in part by Beijing Academy of Artificial Intelligence (BAAI) (No. BAAI2019ZD0403), Beijing Innovation Center for Future Chip, Tsinghua University, and the Science and Technology Innovation Special Zone Project, China.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).