Sort:
Regular Paper Issue
FlexPDA: A Flexible Programming Framework for Deep Learning Accelerators
Journal of Computer Science and Technology 2022, 37 (5): 1200-1220
Published: 30 September 2022

There are a wide variety of intelligence accelerators with promising performance and energy efficiency, deployed in a broad range of applications such as computer vision and speech recognition. However, programming productivity hinders the deployment of deep learning accelerators. The low-level library invoked in the high-level deep learning framework which supports the end-to-end execution with a given model, is designed to reduce the programming burden on the intelligence accelerators. Unfortunately, it is inflexible for developers to build a network model for every deep learning application, which probably brings unnecessary repetitive implementation. In this paper, a flexible and efficient programming framework for deep learning accelerators, FlexPDA, is proposed, which provides more optimization opportunities than the low-level library and realizes quick transplantation of applications to intelligence accelerators for fast upgrades. We evaluate FlexPDA by using 10 representative operators selected from deep learning algorithms and an end-to-end network. The experimental results validate the effectiveness of FlexPDA, which achieves an end-to-end performance improvement of 1.620x over the low-level library.

Regular Paper Issue
Cacheap: Portable and Collaborative I/O Optimization for Graph Processing
Journal of Computer Science and Technology 2019, 34 (3): 690-706
Published: 10 May 2019

Increasingly there is a need to process graphs that are larger than the available memory on today’s machines. Many systems have been developed with graph representations that are efficient and compact for out-of-core processing. A necessary task in these systems is memory management. This paper presents a system called Cacheap which automatically and efficiently manages the available memory to maximize the speed of graph processing, minimize the amount of disk access, and maximize the utilization of memory for graph data. It has a simple interface that can be easily adopted by existing graph engines. The paper describes the new system, uses it in recent graph engines, and demonstrates its integer factor improvements in the speed of large-scale graph processing.

total 2