AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.
Department of Electronic and Computer Engineering, University of California, Santa Barbara, CA 93106, USA.
Show Author Information

Abstract

As performance requirements for bus-based embedded System-on-Chips (SoCs) increase, more and more on-chip application-specific hardware accelerators (e.g., filters, FFTs, JPEG encoders, GSMs, and AES encoders) are being integrated into their designs. These accelerators require system-level tradeoffs among performance, area, and scalability. Accelerator parallelization and Point-to-Point (P2P) interconnect insertion are two effective system-level adjustments. The former helps to boost the computing performance at the cost of area, while the latter provides higher bandwidth at the cost of routability. What's more, they interact with each other. This paper proposes a design flow to optimize accelerator parallelization and P2P interconnect insertion simultaneously. To explore the huge optimization space, we develop an effective algorithm, whose goal is to reduce total SoC latency under the constraints of SoC area and total P2P wire length. Experimental results show that the performance difference between our proposed algorithm and the optimal results is only 2.33% on average, while the running time of the algorithm is less than 17 s.

References

[1]
Ma P., Liu P., Li K., Zou Y., An A., Wang Y., Hao Y., A parallel low latency bus on chip for packet processing mpsoc, in International Conference on Solid-State and Integrated Circuit Technology (ICSICT), 2010, pp. 545–547.
[2]
Ahmedy S., Wangy Z., Klaibery M., Ahl S., Roblewskiy M., Simon S., Parallel hardware architecture for jpeg-ls based on domain decomposition, Proc. SPIE, Applications of Digital Image Processing, vol. 8499, no. 14, pp. 1–11, 2012.
[3]
Sridhara S. R., DiRenzo M., Lingam S., Lee S. J., Blzquez R., Maxey J., Ghanem S., Lee Y. H., Abdallah R., Singh P.et al, Microwatt processor platform for medical system-on-chip applications, IEEE Journal of Solid-State Circuits (JSSC), vol. 46, no. 4, pp. 721–730, 2011.
[4]
Kwong J., Chandrakasan A. P., An energy-efficient biomedical signal processing platform, IEEE Journal of Solid-State Circuits (JSSC), vol. 46, no. 7, pp. 1742–1753, 2011.
[5]
Zhang F., Zhang Y., Silver J., Shakhsheer Y., Nagaraju M., Klinefelter A., Pandey J. N., Boley J., Carlson E. J., Shrivastava A.et al, A batteryless 19w mics/ism-band energy harvesting body area sensor node soc, in IEEE International Solid-state Circuits Conference (ISSCC), 2012, pp. 298–300.
[6]
Goulding-Hotta N., Sampson J., Zheng Q., Bhatt V., Auricchio J., Swanson S., Taylor M. B., Greendroid: An architecture for the dark silicon age, in Asia and South Pacific Design Automation Conference (ASP-DAC), 2012, pp. 100–105.
[7]
Corvino R., Diken E., Gamatie A., Jozwiak L., Transformation-based exploration of data parallel architecture for customizable hardware: A jpeg encoder case study, in Euromicro Conference on Digital System Design (DSD), 2012, pp. 774–781.
[8]
Haris J., Sri P., Synthesis of heterogeneous pipelined multiprocessor systems using ilp: Jpeg case study, in International Conference on Hardware-Software Codesign and System Synthesis (CODES+ISSS), 2008, pp. 1–6.
[9]
Belhadj N., Bahri N., Ayed M. B., Marrakchi Z., Mehrez H., Data level parallelism for h264/avc baseline intra-prediction chain on mpsoc, in Multi-Conference on Systems, Signals and Devices (SSD), 2013, pp. 1–4.
[10]
Hagiescu A., Wong W. F., Bacon D. F., Rabbah R., A computing origami: Folding streams in fpgas, in Design Automation Conference (DAC), 2009, pp. 282–287.
[11]
Li S., Liu Y., Hu X., He X., Zhang Y., Zhang P., Yang H., Optimal partition with block-level parallelization in c-to-rtl synthesis for streaming applications, in Asia and South Pacific Design Automation Conference (ASP-DAC), 2013, pp. 225–230.
[12]
Zuo W., Liang Y., Li P., Rupnow K., Chen D., Cong J., Improving high level synthesis optimization opportunity through polyhedral transformations, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013, pp. 92–97.
[13]
Vainbrand D., Ginosar R., Network-on-chip architectures for neural networks, in International Symposium on Networks-on-chip (NOCS), 2007, pp. 135–144.
[14]
Bertozzi D., Jalabert A., Murali S., Tamhankar R., Stergiou S., Benini L., Micheli G. D., Noc synthesis flow for customized domain specific multiprocessor systems-on-chip, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 16, no. 2, pp. 113–129, 2005.
[15]
Lee H. G., Ogras U. Y., Marculescu R., Chang N., Design space exploration and prototyping for on-chip multimedia applications, in Design Automation Conference (DAC), 2006, pp. 137–142.
[16]
Gladigau J., Gerstlauer A., Haubelt C., Streubhr M., Teich J., A system-level synthesis approach from formal application models to generic bus-based mpsocs, in International Conference on Embedded Computer Systems (SAMOS), 2010, pp. 118–125.
[17]
Hempstead M., Wei G. Y., Brooks D., An accelerator-based wireless sensor network processor in 130 nm cmos, IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), vol. 1, no. 2, pp. 193–202, 2011.
[18]
Zahir R., Ewert M., Seshadri H., The medfield smartphone: Intel architecture in a handheld form factor, IEEE Micro, vol. 33, no. 6, pp. 38–46, 2013.
[19]
Rose B., Samsung's 8-core exynos 5 octa processor: Your next phone will be fast, http://gizmodo.com/5974528/samsungs-new-exynos-processor-just-went-octa, 2013.
[20]
Hauser P., Olivier H., Connected device platform, Patent US20130303087A1, Nov. 14, 2013.
[21]
Bassam R., Toni M., Home automation system: A cheap and open-source alternative to control household appliances, http://www.diva-portal.org/smash/get/diva2:679674/FULLTEXT01.pdf, 2013.
[22]
Lee H. G., Chang N., Ogras U. Y., Marculescu R., On-chip communication architecture exploration: A quantitative evaluation of point-to-point, bus, and network-on-chip approaches, ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 12, no. 3, 2007.
[23]
Pasricha S., Dutt N., Ben-Romdhane M., Constraint-driven bus matrix synthesis for mpsoc, in Asia and South Pacific Design Automation Conference (ASP-DAC), 2006, pp. 30–35.
[24]
Tan S., Qiao F., Xia B., Yang H., Wang H., A functional model of systemc-based mpeg-2 decoder with heterogeneous multi-ip-cores and hybrid-interconnections architecture, in International Congress on Image and Signal Processing (CISP), 2009, pp. 1–5.
[25]
Pham-Quoc C., Heisswolf J., Werner S., Al-Ars Z., Becker J., Bertels K., Hybrid interconnect design for heterogeneous hardware accelerators, in Design, Automation and Test in Europe Conference and Exhibition (DATE), 2013, pp. 843–846.
[26]
Vainbrand D., Ginosar R., Network-on-chip architectures for neural networks, in Symposium on Networks-on-Chip (NOCS), 2010, pp. 135–144.
[27]
Zhu W., Liu L., Yin S., Dong Y., Wei S., Tang E. Y., Song J., Peng J., A 65 nm uneven-dual-core soc based platform for multi-device collaborative computing, in International Symposium on Circuits and Systems (ISCAS), 2014, pp. 2527–2530.
[28]
Wei Y., Sze C., Viswanathan N., Li Z., Alpert C. J., Reddy L., Huber A. D., Tellez G.E., Keller D., Sapatnekar S. S., Glare: Global and local wiring aware routability evaluation, in Design Automation Conference (DAC), 2012, pp. 768–773.
[29]
MIT, 48 half-hour excerpts of two-channel ambulatory ecg recordings, http://www.physionet.org/physiobank/database/mitdb/, 2013.
[30]
Zhang Y., Image Engineering (I) Image Processing (2nd ed), Beijing, China: Tsinghua University Press, 2009.
Tsinghua Science and Technology
Pages 644-660
Cite this article:
Zhang D, Liu Y, Li S, et al. Simultaneous Accelerator Parallelization and Point-to-Point Interconnect Insertion for Bus-Based Embedded SoCs. Tsinghua Science and Technology, 2015, 20(6): 644-660. https://doi.org/10.1109/TST.2015.7350017

571

Views

15

Downloads

1

Crossref

N/A

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 08 November 2015
Accepted: 16 November 2015
Published: 17 December 2015
© The author(s) 2015
Return