Survey Issue
Research on General-Purpose Brain-Inspired Computing Systems
Journal of Computer Science and Technology 2024, 39 (1): 4-21
Published: 25 January 2024
Abstract Collect

Brain-inspired computing is a new technology that draws on the principles of brain science and is oriented to the efficient development of artificial general intelligence (AGI), and a brain-inspired computing system is a hierarchical system composed of neuromorphic chips, basic software and hardware, and algorithms/applications that embody this technology. While the system is developing rapidly, it faces various challenges and opportunities brought by interdisciplinary research, including the issue of software and hardware fragmentation. This paper analyzes the status quo of brain-inspired computing systems. Enlightened by some design principle and methodology of general-purpose computers, it is proposed to construct “general-purpose” brain-inspired computing systems. A general-purpose brain-inspired computing system refers to a brain-inspired computing hierarchy constructed based on the design philosophy of decoupling software and hardware, which can flexibly support various brain-inspired computing applications and neuromorphic chips with different architectures. Further, this paper introduces our recent work in these aspects, including the ANN (artificial neural network)/SNN (spiking neural network) development tools, the hardware agnostic compilation infrastructure, and the chip micro-architecture with high flexibility of programming and high performance; these studies show that the “general-purpose” system can remarkably improve the efficiency of application development and enhance the productivity of basic software, thereby being conductive to accelerating the advancement of various brain-inspired algorithms and applications. We believe that this is the key to the collaborative research and development, and the evolution of applications, basic software and chips in this field, and conducive to building a favorable software/hardware ecosystem of brain-inspired computing.

Open Access Issue
AIPerf: Automated Machine Learning as an AI-HPC Benchmark
Big Data Mining and Analytics 2021, 4 (3): 208-220
Published: 12 May 2021
Abstract PDF (10.3 MB) Collect

The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark’s stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.

Open Access Issue
Towards "General Purpose" Brain-Inspired Computing System
Tsinghua Science and Technology 2021, 26 (5): 664-673
Published: 20 April 2021
Abstract PDF (584.2 KB) Collect

Brain-inspired computing refers to computational models, methods, and systems, that are mainly inspired by the processing mode or structure of brain. A recent study proposed the concept of "neuromorphic completeness" and the corresponding system hierarchy, which is helpful to determine the capability boundary of brain-inspired computing system and to judge whether hardware and software of brain-inspired computing are compatible with each other. As a position paper, this article analyzes the existing brain-inspired chips’€™ design characteristics and the current so-called "general purpose" application development frameworks for brain-inspired computing, as well as introduces the background and the potential of this proposal. Further, some key features of this concept are presented through the comparison with the Turing completeness and approximate computation, and the analyses of the relationship with "general-purpose" brain-inspired computing systems (it means that computing systems can support all computable applications). In the end, a promising technical approach to realize such computing systems is introduced, as well as the on-going research and the work foundation. We believe that this work is conducive to the design of extensible neuromorphic complete hardware-primitives and the corresponding chips. On this basis, it is expected to gradually realize "general purpose" brain-inspired computing system, in order to take into account the functionality completeness and application efficiency.

Open Access Issue
XB-SIM*: A Simulation Framework for Modeling and Exploration of ReRAM-Based CNN Acceleration Design
Tsinghua Science and Technology 2021, 26 (3): 322-334
Published: 12 October 2020
Abstract PDF (3.2 MB) Collect

Resistive Random Access Memory (ReRAM)-based neural network accelerators have potential to surpass their digital counterparts in computational efficiency and performance. However, design of these accelerators faces a number of challenges including imperfections of the ReRAM device and a large amount of calculations required to accurately simulate the former. We present XB-SIM *, a simulation framework for ReRAM-crossbar-based Convolutional Neural Network (CNN) accelerators. XB-SIM * can be flexibly configured to simulate the accelerator’s structure and clock-driven behaviors at the architecture level. This framework also includes an ReRAM-aware Neural Network (NN) training algorithm and a CNN-oriented mapper to train an NN and map it onto the simulated design efficiently. Behavior of the simulator has been verified by the corresponding circuit simulation of a real chip. Furthermore, a batch processing mode of the massive calculations that are required to mimic the behavior of ReRAM-crossbar circuits is proposed to fully apply the computational concurrency of the mapping strategy. On CPU/GPGPU, this batch processing mode can improve the simulation speed by up to 5.02 × or 34.29 ×. Within this framework, comprehensive architectural exploration and end-to-end evaluation have been achieved, which provide some insights for systemic optimization.

Open Access Issue
Hardware Implementation of Spiking Neural Networks on FPGA
Tsinghua Science and Technology 2020, 25 (4): 479-486
Published: 13 January 2020
Abstract PDF (1.1 MB) Collect

Inspired by real biological neural models, Spiking Neural Networks (SNNs) process information with discrete spikes and show great potential for building low-power neural network systems. This paper proposes a hardware implementation of SNN based on Field-Programmable Gate Arrays (FPGA). It features a hybrid updating algorithm, which combines the advantages of existing algorithms to simplify hardware design and improve performance. The proposed design supports up to 16 384 neurons and 16.8 million synapses but requires minimal hardware resources and archieves a very low power consumption of 0.477 W. A test platform is built based on the proposed design using a Xilinx FPGA evaluation board, upon which we deploy a classification task on the MNIST dataset. The evaluation results show an accuracy of 97.06% and a frame rate of 161 frames per second.

Open Access Issue
HW/SW Co-optimization for Stencil Computation: Beginning with a Customizable Core
Tsinghua Science and Technology 2016, 21 (5): 570-580
Published: 18 October 2016
Abstract PDF (517.7 KB) Collect

Energy efficiency is one of the most important issues for High Performance Computing (HPC) today. Heterogeneous HPC platform with some energy-efficient customizable cores (as application-specific accelerators) is believed as one of the promising solutions to meet ever-increasing computing needs and to overcome power density limitations. In this paper, we focus on using customizable processor cores to optimize the typical stencil computations — the kernel of many high-performance applications. We develop a series of effective software/hardware co-optimization strategies to exploit the instruction-level and memory-computation parallelism, as well as to decrease the energy consumption. These optimizations include loop tiling, prefetching, cache customization, Single Instruction Multiple Data (SIMD), and Direct Memory Access (DMA), as well as necessary ISA extensions. Detailed tests of power-efficiency are given to evaluate the effect of all these optimizations comprehensively. The results are impressive: the combination of these optimizations has improved the application performance by 341% while the energy consumption has been decreased by 35%; a preliminary comparison with X86, GPU, and FPGA platforms also showed that the design could achieve an order of magnitude higher performance efficiency. We believe this work can help understand sources of inefficiency in general-purpose chips and can be used as a beginning to customize an energy efficient CMP for further improvement.

Total 6