Journal Home > Volume 28 , Issue 2

To solve the problem of grid coarse-grained reconfigurable array task mapping under multiple constraints, we propose a Loop Subgraph-Level Greedy Mapping (LSLGM) algorithm using parallelism and processing element fragmentation. Under the constraint of a reconfigurable array, the LSLGM algorithm schedules node from a ready queue to the current reconfigurable cell array block. After mapping a node, its successor’s indegree value will be dynamically updated. If its successor’s indegree is zero, it will be directly scheduled to the ready queue; otherwise, the predecessor must be dynamically checked. If the predecessor cannot be mapped, it will be scheduled to a blocking queue. To dynamically adjust the ready node scheduling order, the scheduling function is constructed by exploiting factors, such as node number, node level, and node dependency. Compared with the loop subgraph-level mapping algorithm, experimental results show that the total cycles of the LSLGM algorithm decreases by an average of 33.0 % ( PEA4×4) and 33.9 % ( PEA7×7). Compared with the epimorphism map algorithm, the total cycles of the LSLGM algorithm decrease by an average of 38.1 % ( PEA4×4) and 39.0 % ( PEA7×7). The feasibility of LSLGM is verified.


menu
Abstract
Full text
Outline
About this article

Loop Subgraph-Level Greedy Mapping Algorithm for Grid Coarse-Grained Reconfigurable Array

Show Author's information Naijin Chen1( )Fei Cheng1( )Chenghao Han1Jianhui Jiang2Xiaoqing Wen3
School of Computer and Information Science, Anhui Polytechnic University, Wuhu 241000, China
School of Software Engineering, Tongji University, Shanghai 201804, China
Department of Computer Science and Networks, Kyushu Institute of Technology, Fukuoka 820-8502, Japan

Abstract

To solve the problem of grid coarse-grained reconfigurable array task mapping under multiple constraints, we propose a Loop Subgraph-Level Greedy Mapping (LSLGM) algorithm using parallelism and processing element fragmentation. Under the constraint of a reconfigurable array, the LSLGM algorithm schedules node from a ready queue to the current reconfigurable cell array block. After mapping a node, its successor’s indegree value will be dynamically updated. If its successor’s indegree is zero, it will be directly scheduled to the ready queue; otherwise, the predecessor must be dynamically checked. If the predecessor cannot be mapped, it will be scheduled to a blocking queue. To dynamically adjust the ready node scheduling order, the scheduling function is constructed by exploiting factors, such as node number, node level, and node dependency. Compared with the loop subgraph-level mapping algorithm, experimental results show that the total cycles of the LSLGM algorithm decreases by an average of 33.0 % ( PEA4×4) and 33.9 % ( PEA7×7). Compared with the epimorphism map algorithm, the total cycles of the LSLGM algorithm decrease by an average of 38.1 % ( PEA4×4) and 39.0 % ( PEA7×7). The feasibility of LSLGM is verified.

Keywords: scheduling, mapping, Grid Coarse-Grained Reconfigurable Array (GCGRA), loop subgraph

References(15)

[1]
M. Brandalero, L. Carro, A. C. S. Beck, and M. Shafique, Multi-target adaptive reconfigurable acceleration for low-power IoT processing, IEEE Trans. Comput., vol. 70, no. 1, pp. 83–98, 2021.
[2]
I. Bae, B. Harris, H. Min, and B. Egger, Auto-tuning CNNs for coarse-grained reconfigurable array-based accelerators, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2301–2310, 2018.
[3]
J. M. P. Cardoso, P. C. Diniz, and M. Weinhardt, Compiling for reconfigurable computing: A survey, ACM Comput. Surv., vol. 42, no. 4, p. 13, 2010.
[4]
G. Charitopoulos, D. N. Pnevmatikatos, and G. Gaydadjiev, MC-DeF: Creating customized CGRAs for dataflow applications, ACM Trans. Archit. Code Optim., vol. 18, no. 3, p. 26, 2021.
[5]
L. B. Liu, J. F. Zhu, Z. S. Li, Y. N. Lu, Y. D. Deng, J. Han, S. Y. Yin, and S. J. Wei, A survey of coarse-grained reconfigurable architecture and design: Taxonomy, challenges, and applications, ACM Comput. Surv., vol. 52, no. 6, p. 118, 2020.
[6]
N. J. Chen, Z. Wang, R. X. He, J. H. Jiang, F. Cheng, and C. H. Han, Efficient scheduling mapping algorithm for row parallel coarse-grained reconfigurable architecture, Tsinghua Science and Technology, vol. 26, no. 5, pp. 724–735, 2021.
[7]
N. J. Chen and J. H. Jiang, Mapping algorithm of coarse grained reconfigurable cell array for multi-branch tree data flow graph, (in Chinese), J. Comput. -Aided Des. Comput. Graphics, vol. 28, no. 7, pp. 1180–1187, 2016.
[8]
N. J. Chen and Z. Y. Feng, Interconnect delay performance evaluation for non-crossing level and row operands parallel RCA, (in Chinese), J. Tianjin Univ. (Sci. Technol.), vol. 50, no. 4, pp. 429–436, 2017.
[9]
M. Balasubramanian and A. Shrivastava, CRIMSON: Compute-intensive loop acceleration by randomized iterative modulo scheduling and optimized mapping on CGRAs, IEEE Trans. Comput. -Aided Des. Integr. Circuits Syst., vol. 39, no. 11, pp. 3300–3310, 2020.
[10]
G. Lee, E. Cetin, and O. Diessel, Fault recovery time analysis for coarse-grained reconfigurable architectures, ACM Trans. Embedded Comput. Syst., vol. 17, no. 2, p. 42, 2018.
[11]
T. Kojima, N. A. V. Doan, and H. Amano, GenMap: A genetic algorithmic approach for optimizing spatial mapping of coarse-grained reconfigurable architectures, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 11, pp. 2383–2396, 2020.
[12]
M. Hamzeh, A. Shrivastava, and S. Vrudhula, EPIMap: Using Epimorphism to map applications on CGRAs, in Proc. 2012 DAC Design Automation Conf., San Francisco, CA, USA, 2012, pp. 1280–1287.
[13]
M. Hamzeh, A. Shrivastava, and S. Vrudhula, REGIMap: Register-aware application mapping on Coarse-Grained Reconfigurable Architectures (CGRAs), in Proc.50th ACM/EDAC/IEEE Design Automation Conf. (DAC), Austin, TX, USA, 2013, pp. 1–10.
[14]
N. J. Chen, J. H. Jiang, X. Chen, Z. Zhou, and Y. Xu, An improved level partitioning algorithm considering minimum execution delay and resource restraints, (in Chinese), Acta Electron. Sin., vol. 40, no. 5, pp. 1055–1066, 2012.
[15]
N. J. Chen, Z. Y. Feng, and J. H. Jiang, Bypass node non-redundant adding algorithm for crossing-level data transmission in two-dimension reconfigurable cell array, (in Chinese), J. Commun., vol. 36, no. 4, p. 2015132, 2015.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 29 October 2021
Revised: 27 December 2021
Accepted: 24 January 2022
Published: 29 September 2022
Issue date: April 2023

Copyright

© The author(s) 2023.

Acknowledgements

This research was supported by the Natural Science Foundation of Anhui Province (No. 1808085MF203) and the Natural Science Foundation of China (Nos. 61972438 and 61432017).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return