Journal Home > Volume 26 , Issue 5

Row Parallel Coarse-Grained Reconfigurable Architecture (RPCGRA) has the advantages of maximum parallelism and programmable flexibility. Designing an efficient algorithm to map the diverse applications onto RPCGRA is difficult due to a number of RPCGRA hardware constraints. To solve this problem, the nodes of the data flow graph must be partitioned and scheduled onto the RPCGRA. In this paper, we present a Depth-First Greedy Mapping (DFGM) algorithm that simultaneously considers the communication costs and the use times of the Reconfigurable Cell Array (RCA). Compared with level breadth mapping, the performance of DFGM is better. The percentage of maximum improvement in the use times of RCA is 33% and the percentage of maximum improvement in non-original input and output times is 64.4% (Given Discrete Cosine Transfor 8 (DCT8), and the area of reconfigurable processing unit is 56). Compared with level-based depth mapping, DFGM also obtains the lowest averages of use times of RCA, non-original input and output times, and the reconfigurable time.


menu
Abstract
Full text
Outline
About this article

Efficient Scheduling Mapping Algorithm for Row Parallel Coarse-Grained Reconfigurable Architecture

Show Author's information Naijin ChenZhen Wang( )Ruixiang HeJianhui JiangFei ChengChenghao Han
School of Computer and Information Science, Anhui Polytechnic University, Wuhu 241000, China
School of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090, China
School of Software Engineering, Tongji University, Shanghai 201804, China

Abstract

Row Parallel Coarse-Grained Reconfigurable Architecture (RPCGRA) has the advantages of maximum parallelism and programmable flexibility. Designing an efficient algorithm to map the diverse applications onto RPCGRA is difficult due to a number of RPCGRA hardware constraints. To solve this problem, the nodes of the data flow graph must be partitioned and scheduled onto the RPCGRA. In this paper, we present a Depth-First Greedy Mapping (DFGM) algorithm that simultaneously considers the communication costs and the use times of the Reconfigurable Cell Array (RCA). Compared with level breadth mapping, the performance of DFGM is better. The percentage of maximum improvement in the use times of RCA is 33% and the percentage of maximum improvement in non-original input and output times is 64.4% (Given Discrete Cosine Transfor 8 (DCT8), and the area of reconfigurable processing unit is 56). Compared with level-based depth mapping, DFGM also obtains the lowest averages of use times of RCA, non-original input and output times, and the reconfigurable time.

Keywords: temporal mapping, Reconfigurable Cell Array (RCA), listed scheduling, communication costs

References(19)

[1]
J. M. P. Cardoso, P. C. Diniz, and M. Weinhardt, Compiling for reconfigurable computing: A survey, ACM Computing Surveys, vol. 42, no. 4, pp. 1301–1365, 2010.
[2]
J. W. Yoon, A. Shrivastava, S. Park, M. Ahn, and Y. Paek, A graph drawing based spatial algorithm for coarse-grained reconfigurable architectures, IEEE Transactions on Very Large Scale Integration Systems, vol. 17, no. 11, pp. 1565–1578, 2009.
[3]
M. Berekovic, A. Kanstein, B. Mei, and B. D. Sutter, Mapping of nomadic multimedia applications on the ADRES reconfigurable array processor, Microprocessors and Microsystems, vol. 33, no. 4, pp. 290–294, 2009.
[4]
R. S. Ferreira, J. M. P. Cardoso, A. Damiany, J. Vendramini, and T. Teixeira, Fast placement and routing by extending coarse grained reconfigurable arrays with Omega networks, Journal of Systems Architecture, vol. 57, no. 8, pp. 761–777, 2011.
[5]
R. Krishnamoorthy, K. Varadarajan, and S. K. Nandy, Interconnect-topology independent mapping algorithm for a coarse grained reconfigurable architecture, in Proc. of 2011 International Conference on Field Programmable Technology, New Delhi, India, 2011, pp. 1–5.
[6]
M. Ahn, J. W. Yoon, Y. Paek, Y. Kim, M. Kiemb, and K. Choi, A spatial mapping algorithm for heterogeneous coarse grained reconfigurable architectures, in Proc. of the Conference on Design, Automation and Test in Europe, Munich, Germany, 2006, pp. 363–368.
[7]
G. Ansaloni, K. Tanimura, L. Pozzi, and N. Dutt, Integrated kernel partitioning and scheduling for coarse grained reconfigurable arrays, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 31, no. 12, pp. 1803–1816, 2012.
[8]
G. Lee, K. Choi, and N. D. Dutt, Mapping multi-domain applications onto coarse grained reconfigurable architectures, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 5, pp. 637–650, 2011.
[9]
M. Jo, D. Lee, K. Han, and K. Choi, Design of a coarse-grained reconfigurable architecture with floating-point support and comparative study, Integration, the VLSI Journal, vol. 47, no. 2, pp. 232–241, 2014.
[10]
W. Kim, Y. Choi, and H. park, Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures, ACM Transactions on Architecture and Code Optimization, vol. 10, no. 4, pp. 1–24, 2013.
[11]
N. J. Chen and J. H. Jiang, Mapping algorithm for coarse-grained reconfigurable multimedia architectures, in Proc. of 19th IEEE International Parallel & Distributed Processing Symposium (IPDPS) Workshop, Shanghai, China, 2012, pp. 281–286.
[12]
S. D. Yu, Research on the software/hardware co-design for reconfigurable processor, PhD dissertation, School of Information Science and Technology, Tsinghua University, Beijing, China, 2009.
[13]
H. Singh, M.-H. Lee, G. Lu, F. J. Kurdahi, and E. M. C. Filho, MorphoSys: An integrated reconfigurable system for data parallel and computation intensive applications, IEEE Transactions on Computers, vol. 49, no. 5, pp. 465–481, 2000.
[14]
N. J. Chen, J. H. Jiang, X. Chen, Z. Zhou, and Y. Xu, An improved level partitioning algorithm considering minimum execution delay and resource restraints, Acta Electronica Sinica, vol. 40, no. 5, pp. 1055–1066, 2012.
[15]
J. Xiao, Z. H. Shi, W. D. Zhu, J. H. Jiang, Q. W. Zhou, J. Lou, Y. Huang, Q. Ji, and Z. Sun, Uniform non-Bernoulli sequences oriented locating method for reliability-critical gates, Tsinghua Science and Technology, vol. 26, no. 1, pp. 24–35, 2021.
[16]
Y. M. Ouyang, Q. Wang, Z. Li, H. G. Liang, and J. Li, Fault-tolerant design for data efficient retransmission in WiNoC, Tsinghua Science and Technology, vol. 26, no.1, pp. 85–94, 2021.
[17]
O. Sangyun, L. Hongsik, and L. Jongeun, Efficient execution of stream graphs on coarse-grained reconfigurable architectures, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 36, no. 12, pp. 1978–1988, 2017.
[18]
I. Bae, B. Harris, H. Min, and B. Egger, Auto-tuning CNNs for coarse-grained reconfigurable array-based accelerators, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2301–2310, 2018.
[19]
J. Y. Gu, S. Y. Yin, L. B. liu, and S. J. Wei, Stress-aware loops mapping on CGRAs with dynamic multi-map reconfiguration, IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 9, pp. 2105–2120, 2018.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 30 July 2020
Accepted: 08 September 2020
Published: 20 April 2021
Issue date: October 2021

Copyright

© The author(s) 2021

Acknowledgements

This research was supported by the Natural Science Foundation of Anhui Province (No. 1808085MF203) and the National Natural Science Foundation of China (No. 61432017).

Rights and permissions

© The author(s) 2021. The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return