Scholar - SciOpen

Agile hardware design is gaining increasing momentum and bringing new chips in larger quantities to the market faster. However, it also takes new challenges for compiler developers to retarget existing compilers to these new chips in shorter time than ever before. Currently, retargeting a compiler backend, e.g., an LLVM backend to a new target, requires compiler developers to write manually a set of target description files (totalling 10300+ lines of code (LOC) for RISC-V in LLVM), which is error-prone and time-consuming. In this paper, we introduce a new approach, Automatic Target Description File Generation (ATG), which accelerates the generation of a compiler backend for a new target by generating its target description files automatically. Given a new target, ATG proceeds in two stages. First, ATG synthesizes a small list of target-specific properties and a list of code-layout templates from the target description files of a set of existing targets with similar instruction set architectures (ISAs). Second, ATG requests compiler developers to fill in the information for each instruction in the new target in tabular form according to the list of target-specific properties synthesized and then generates its target description files automatically according to the list of code-layout templates synthesized. The first stage can often be reused by different new targets sharing similar ISAs. We evaluate ATG using nine RISC-V instruction sets drawn from a total of 1029 instructions in LLVM 12.0. ATG enables compiler developers to generate compiler backends for these ISAs that emit the same assembly code as the existing compiler backends for RISC-V but with significantly less development effort (by specifying each instruction in terms of up to 61 target-specific properties only).

Regular Paper Issue

VTensor: Using Virtual Tensors to Build a Layout-Oblivious AI Programming Framework

Feng Yu, Jia-Cheng Zhao, Hui-Min Cui, Xiao-Bing Feng, Jingling Xue

Journal of Computer Science and Technology 2023, 38 (5): 1074-1097

Published: 30 September 2023

Abstract

Download citation

GB/T 7714-2015

EndNote(RIS)

BibTeX

NoteExpress

Refworks

Collect Collected

Tensors are a popular programming interface for developing artificial intelligence (AI) algorithms. Layout refers to the order of placing tensor data in the memory and will affect performance by affecting data locality; therefore the deep neural network library has a convention on the layout. Since AI applications can use arbitrary layouts, and existing AI systems do not provide programming abstractions to shield the layout conventions of libraries, operator developers need to write a lot of layout-related code, which reduces the efficiency of integrating new libraries or developing new operators. Furthermore, the developer assigns the layout conversion operation to the internal operator to deal with the uncertainty of the input layout, thus losing the opportunity for layout optimization. Based on the idea of polymorphism, we propose a layout-agnostic virtual tensor programming interface, namely the VTensor framework, which enables developers to write new operators without caring about the underlying physical layout of tensors. In addition, the VTensor framework performs global layout inference at runtime to transparently resolve the required layout of virtual tensors, and runtime layout-oriented optimizations to globally minimize the number of layout transformation operations. Experimental results demonstrate that with VTensor, developers can avoid writing layout-dependent code. Compared with TensorFlow, for the 16 operations used in 12 popular networks, VTensor can reduce the lines of code (LOC) of writing a new operation by 47.82% on average, and improve the overall performance by 18.65% on average.

Regular Paper Issue

Cacheap: Portable and Collaborative I/O Optimization for Graph Processing

Peng Zhao, Chen Ding, Lei Liu, Jiping Yu, Wentao Han, Xiao-Bing Feng

Journal of Computer Science and Technology 2019, 34 (3): 690-706

Published: 10 May 2019

Abstract

Download citation

GB/T 7714-2015

EndNote(RIS)

BibTeX

NoteExpress

Refworks

Collect Collected

Increasingly there is a need to process graphs that are larger than the available memory on today’s machines. Many systems have been developed with graph representations that are efficient and compact for out-of-core processing. A necessary task in these systems is memory management. This paper presents a system called Cacheap which automatically and efficiently manages the available memory to maximize the speed of graph processing, minimize the amount of disk access, and maximize the utilization of memory for graph data. It has a simple interface that can be easily adopted by existing graph engines. The paper describes the new system, uses it in recent graph engines, and demonstrates its integer factor improvements in the speed of large-scale graph processing.

total 3