Scholar - SciOpen

Resistive random access memory (RRAM) has been demonstrated to implement multiply-and-accumulate (MAC) operations using a highly parallel analog fashion, which dramatically accelerates the convolutional neural networks (CNNs). Since CNNs require considerable converters between analog crossbars and digital peripheral circuits, recent studies map the binary neural networks (BNNs) onto RRAM and binarize the weights to {+1, −1}. However, two mainstream representations for BNN weights introduce patterns of redundant 0s and 1s when dealing with negative weights. In this work, we reduce the area of redundant 0s and 1s by proposing a BNN weight representation framework based on the novel pattern representation and a corresponding architecture. First, we spilt the weight matrix into several small matrices by clustering adjacent columns together. Second, we extract 1s’ patterns, i.e., the submatrices only containing 1s, from the small weight matrix, such that each final output can be represented by the sum of several patterns. Third, we map these patterns onto RRAM crossbars, including pattern computation crossbars (PCCs) and pattern accumulation crossbars (PACs). Finally, we compare the pattern representation with two mainstream representations and adopt the more area efficient one. The evaluation results demonstrate that our framework can save over 20% of crossbar area effectively, compared with two mainstream representations.

Regular Paper Issue

Bigflow: A General Optimization Layer for Distributed Computing Frameworks

Yun-Cong Zhang, Xiao-Yang Wang, Cong Wang, Yao Xu, Jian-Wei Zhang, Xiao-Dong Lin, Guang-Yu Sun, Gong-Lin Zheng, Shan-Hui Yin, Xian-Jin Ye, Li Li, Zhan Song, Dong-Dong Miao

Journal of Computer Science and Technology 2020, 35 (2): 453-467

Published: 27 March 2020

Abstract

Download citation

GB/T 7714-2015

EndNote(RIS)

BibTeX

NoteExpress

Refworks

Collect Collected

As data volumes grow rapidly, distributed computations are widely employed in data-centers to provide cheap and efficient methods to process large-scale parallel datasets. Various computation models have been proposed to improve the abstraction of distributed datasets and hide the details of parallelism. However, most of them follow the single-layer partitioning method, which limits developers to express a multi-level partitioning operation succinctly. To overcome the problem, we present the NDD (Nested Distributed Dataset) data model. It is a more compact and expressive extension of Spark RDD (Resilient Distributed Dataset), in order to remove the burden on developers to manually write the logic for multi-level partitioning cases. Base on the NDD model, we develop an open-source framework called Bigflow, which serves as an optimization layer over computation engines from most widely used processing frameworks. With the help of Bigflow, some advanced optimization techniques, which may only be applied by experienced programmers manually, are enabled automatically in a distributed data processing job. Currently, Bigflow is processing about 3 PB data volumes daily in the data-centers of Baidu. According to customer experience, it can significantly save code length and improve performance over the intuitive programming style.