Scholar - SciOpen

Agile hardware design is gaining increasing momentum and bringing new chips in larger quantities to the market faster. However, it also takes new challenges for compiler developers to retarget existing compilers to these new chips in shorter time than ever before. Currently, retargeting a compiler backend, e.g., an LLVM backend to a new target, requires compiler developers to write manually a set of target description files (totalling 10300+ lines of code (LOC) for RISC-V in LLVM), which is error-prone and time-consuming. In this paper, we introduce a new approach, Automatic Target Description File Generation (ATG), which accelerates the generation of a compiler backend for a new target by generating its target description files automatically. Given a new target, ATG proceeds in two stages. First, ATG synthesizes a small list of target-specific properties and a list of code-layout templates from the target description files of a set of existing targets with similar instruction set architectures (ISAs). Second, ATG requests compiler developers to fill in the information for each instruction in the new target in tabular form according to the list of target-specific properties synthesized and then generates its target description files automatically according to the list of code-layout templates synthesized. The first stage can often be reused by different new targets sharing similar ISAs. We evaluate ATG using nine RISC-V instruction sets drawn from a total of 1029 instructions in LLVM 12.0. ATG enables compiler developers to generate compiler backends for these ISAs that emit the same assembly code as the existing compiler backends for RISC-V but with significantly less development effort (by specifying each instruction in terms of up to 61 target-specific properties only).

Regular Paper Issue

VTensor: Using Virtual Tensors to Build a Layout-Oblivious AI Programming Framework

Feng Yu, Jia-Cheng Zhao, Hui-Min Cui, Xiao-Bing Feng, Jingling Xue

Journal of Computer Science and Technology 2023, 38 (5): 1074-1097

Published: 30 September 2023

Abstract

Download citation

GB/T 7714-2015

EndNote(RIS)

BibTeX

NoteExpress

Refworks

Collect Collected

Tensors are a popular programming interface for developing artificial intelligence (AI) algorithms. Layout refers to the order of placing tensor data in the memory and will affect performance by affecting data locality; therefore the deep neural network library has a convention on the layout. Since AI applications can use arbitrary layouts, and existing AI systems do not provide programming abstractions to shield the layout conventions of libraries, operator developers need to write a lot of layout-related code, which reduces the efficiency of integrating new libraries or developing new operators. Furthermore, the developer assigns the layout conversion operation to the internal operator to deal with the uncertainty of the input layout, thus losing the opportunity for layout optimization. Based on the idea of polymorphism, we propose a layout-agnostic virtual tensor programming interface, namely the VTensor framework, which enables developers to write new operators without caring about the underlying physical layout of tensors. In addition, the VTensor framework performs global layout inference at runtime to transparently resolve the required layout of virtual tensors, and runtime layout-oriented optimizations to globally minimize the number of layout transformation operations. Experimental results demonstrate that with VTensor, developers can avoid writing layout-dependent code. Compared with TensorFlow, for the 16 operations used in 12 popular networks, VTensor can reduce the lines of code (LOC) of writing a new operation by 47.82% on average, and improve the overall performance by 18.65% on average.

Survey Issue

Reinvent Cloud Software Stacks for Resource Disaggregation

Chen-Xi Wang, Yi-Zhou Shan, Peng-Fei Zuo, Hui-Min Cui

Journal of Computer Science and Technology 2023, 38 (5): 949-969

Published: 30 September 2023

Abstract

Download citation

GB/T 7714-2015

EndNote(RIS)

BibTeX

NoteExpress

Refworks

Collect Collected

Due to the unprecedented development of low-latency interconnect technology, building large-scale disaggregated architecture is drawing more and more attention from both industry and academia. Resource disaggregation is a new way to organize the hardware resources of datacenters, and has the potential to overcome the limitations, e.g., low resource utilization and low reliability, of conventional datacenters. However, the emerging disaggregated architecture brings severe performance and latency problems to the existing cloud systems. In this paper, we take memory disaggregation as an example to demonstrate the unique challenges that the disaggregated datacenter poses to the existing cloud software stacks, e.g., programming interface, language runtime, and operating system, and further discuss the possible ways to reinvent the cloud systems.

total 3