A Novel Parallel Processing Element Architecture for Accelerating ODE and AI

Kaiyuan Yang; Longchao Liu; Haotian Liu; Tiantai Deng

doi:10.26599/TST.2024.9010090

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (896.8 KB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

A Novel Parallel Processing Element Architecture for Accelerating ODE and AI

Kaiyuan Yang, Longchao Liu, Haotian Liu, Tiantai Deng(

)

Department of Electronic and Electrical Engineering, The University of Sheffield, Sheffield S13JD, United Kingdom

Show Author Information

Abstract

Transforming complex problems, such as transforming ordinary differential equations (ODEs) into matrix formats, into simpler computational tasks is key for AI advancements and paves the way for more efficient computing architectures. Systolic Arrays, known for their computational efficiency, low power use and ease of implementation, address AI’s computational challenges. They are central to mainstream industry AI accelerators, with improvements to the Processing Element (PE) significantly boosting systolic array performance, and also streamlines computing architectures, paving the way for more efficient solutions in technology fields. This research presents a novel PE design and its integration of systolic array based on a novel computing theory - bit-level mathematics for Multiply-Accumulate (MAC) operation. We present 3 different architectures for the PE and provide a comprehensive comparison between them and the state-of-the-art technologies, focusing on power, area, and throughput. This research also demonstrates the integration of the proposed MAC unit design with systolic arrays, highlighting significant improvements in computational efficiency. Our implementations show a 2380952.38 times lower latency, yet 64.19 times less DSP48E1, 1.26 times less Look-Up Tables (LUTs), 10.76 times less Flip-Flops (FFs), with 99.63 times less power consumption and 15.19 times higher performance per PE compared to the state-of-the-art design.

Keywords

arallel MAC Unit Processing Element (PE)FPGA implementation computing theory

References

【1】

Crossref Google Scholar

Tsinghua Science and Technology

Volume 30 Issue 5,
October 2025

Pages 1954-1964

DOI: 10.26599/TST.2024.9010090

	{{item.num}}
{{version.versionName}} Author Response
{{version.versionName}} Review comment

Comments on this article

Go to comment

< Back to all reports

Review Status: {{reviewData.commendedNum}} Commended , {{reviewData.revisionRequiredNum}} Revision Required , {{reviewData.notCommendedNum}} Not Commended Under Peer Review

Review Comment

Cite this Report

. . , , {{reviewData.reportCite.doi}}

Cite this article:

Yang K, Liu L, Liu H, et al. A Novel Parallel Processing Element Architecture for Accelerating ODE and AI. Tsinghua Science and Technology, 2025, 30(5): 1954-1964. https://doi.org/10.26599/TST.2024.9010090

Part of a topical collection:

Special Section on Neural networks depicted in ODEs with Applications

3165

Views

251

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Received: 30 March 2024

Revised: 02 May 2024

Accepted: 09 May 2024

Published: 29 April 2025

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).