HW/SW Co-optimization for Stencil Computation: Beginning with a Customizable Core

Yanhua Li; Youhui Zhang; Weiming Zheng

doi:10.1109/TST.2016.7590326

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (517.7 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

HW/SW Co-optimization for Stencil Computation: Beginning with a Customizable Core

Yanhua Li, Youhui Zhang(

), Weiming Zheng

Department of Computer Science, Tsinghua University, Beijing 100084, China.

Show Author Information

Abstract

Energy efficiency is one of the most important issues for High Performance Computing (HPC) today. Heterogeneous HPC platform with some energy-efficient customizable cores (as application-specific accelerators) is believed as one of the promising solutions to meet ever-increasing computing needs and to overcome power density limitations. In this paper, we focus on using customizable processor cores to optimize the typical stencil computations — the kernel of many high-performance applications. We develop a series of effective software/hardware co-optimization strategies to exploit the instruction-level and memory-computation parallelism, as well as to decrease the energy consumption. These optimizations include loop tiling, prefetching, cache customization, Single Instruction Multiple Data (SIMD), and Direct Memory Access (DMA), as well as necessary ISA extensions. Detailed tests of power-efficiency are given to evaluate the effect of all these optimizations comprehensively. The results are impressive: the combination of these optimizations has improved the application performance by 341% while the energy consumption has been decreased by 35%; a preliminary comparison with X86, GPU, and FPGA platforms also showed that the design could achieve an order of magnitude higher performance efficiency. We believe this work can help understand sources of inefficiency in general-purpose chips and can be used as a beginning to customize an energy efficient CMP for further improvement.

Keywords

energy efficiency customizable processor stencil computation software and hardware co-optimization

References

【1】

Crossref Google Scholar

Tsinghua Science and Technology

Volume 21 Issue 5,
October 2016

Pages 570-580

DOI: 10.1109/TST.2016.7590326

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Li Y, Zhang Y, Zheng W. HW/SW Co-optimization for Stencil Computation: Beginning with a Customizable Core. Tsinghua Science and Technology, 2016, 21(5): 570-580. https://doi.org/10.1109/TST.2016.7590326

940

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 07 September 2015

Accepted: 03 February 2016

Published: 18 October 2016