More Bang for Your Buck: Boosting Performance with Capped Power Consumption

Juan Chen; Xinxin Qi; Feihao Wu; Jianbin Fang; Yong Dong; Yuan Yuan; Zheng Wang; Keqin Li

doi:10.26599/TST.2020.9010012

Tsinghua Science and Technology 2021, 26(3): 370-383 https://doi.org/10.26599/TST.2020.9010012

Open Access | Issue | Published: 12 October 2020

More Bang for Your Buck: Boosting Performance with Capped Power Consumption

Show Author's Information Hide Author's Information Juan Chen(

), Xinxin Qi, Feihao Wu, Jianbin Fang, Yong Dong, Yuan Yuan, Zheng Wang, Keqin Li

College of Computer, National University of Defense Technology, Changsha 410073, China.

College of Computer, University of Leeds, London LS2 9JT, UK.

School of Science and Engineering, State University of New York, New York, NY 12561, USA.

Keywords:

energy efficiency, high-performance computing, performance boost, power control, processor frequency scaling

Cite this article:

Chen J, Qi X, Wu F, et al. More Bang for Your Buck: Boosting Performance with Capped Power Consumption. Tsinghua Science and Technology, 2021, 26(3): 370-383. https://doi.org/10.26599/TST.2020.9010012

Download citation

EndNote(RIS)

BibTeX

612

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Achieving faster performance without increasing power and energy consumption for computing systems is an outstanding challenge. This paper develops a novel resource allocation scheme for memory-bound applications running on High-Performance Computing (HPC) clusters, aiming to improve application performance without breaching peak power constraints and total energy consumption. Our scheme estimates how the number of processor cores and CPU frequency setting affects the application performance. It then uses the estimate to provide additional compute nodes to memory-bound applications if it is profitable to do so. We implement and apply our algorithm to 12 representative benchmarks from the NAS parallel benchmark and HPC Challenge (HPCC) benchmark suites and evaluate it on a representative HPC cluster. Experimental results show that our approach can effectively mitigate memory contention to improve application performance, and it achieves this without significantly increasing the peak power and overall energy consumption. Our approach obtains on average 12.69% performance improvement over the default resource allocation strategy, but uses 7.06% less total power, which translates into 17.77% energy savings.

Full text

Abstract

Full text

Outline

About this article

More Bang for Your Buck: Boosting Performance with Capped Power Consumption

Show Author's information Hide Author's Information Juan Chen(

), Xinxin Qi, Feihao Wu, Jianbin Fang, Yong Dong, Yuan Yuan, Zheng Wang, Keqin Li

College of Computer, National University of Defense Technology, Changsha 410073, China.

College of Computer, University of Leeds, London LS2 9JT, UK.

School of Science and Engineering, State University of New York, New York, NY 12561, USA.

Abstract

Keywords: energy efficiency, high-performance computing, performance boost, power control, processor frequency scaling

References(34)

[1]

R. H. Dennard, F. H. Gaensslen, H. N. Yu, V. L. Rideout, E. Bassous, and A. R. LeBlanc, Design of ion-implanted MOSFET’s with very small physical dimensions, IEEE Journal of Solid-State Circuits, vol. 9, no. 5, pp. 256-268, 1974.

DOI Google Scholar

[2]

M. Bohr, A 30 year retrospective on Dennard’s MOSFET scaling paper, IEEE Solid-State Circuits Society Newsletter, vol. 12, no. 1, pp. 11-13, 2007.

DOI Google Scholar

[3]

R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen, Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction, in Proc. 36th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO 36), San Diego, CA, USA, 2003, pp. 81-92.

DOI

[4]

R. Kumar, V. Zyuban, and D. M. Tullsen, Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling, ACM SIGARCH Computer Architecture News, vol. 33, no. 2, pp. 408-419, 2005.

DOI Google Scholar

[5]

R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan, Heterogeneous chip multiprocessors, Computer, vol. 38, no. 11, pp. 32-38, 2005.

DOI Google Scholar

[6]

T. Heath, B. Diniz, E. V. Carrera, W. Meira, and R. Bianchini, Energy conservation in heterogeneous server clusters, in Proc. 10th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Chicago, IL, USA, 2005, pp. 186-195.

DOI

[7]

Y. M. Li, K. Skadron, D. Brooks, and Z. G. Hu, Performance, energy, and thermal considerations for SMT and CMP architectures, in Proc. 11th Int. Symp. High-Performance Computer Architecture, San Francisco, CA, USA, 2005, pp. 71-82.

[8]

A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke, Composite cores: Pushing heterogeneity into a core, in 2012 45th Annu. IEEE/ACM Int. Symp. Microarchitecture, Vancouver, Canada, 2012, pp. 317-328

DOI

[9]

T. S. Muthukaruppan, M. Pricopi, V. Venkataramani, T. Mitra, and S. Vishin, Hierarchical power management for asymmetric multi-core in dark silicon era, in 2013 50th ACM/EDAC/IEEE Design Automation Conf. (DAC), Austin, TX, USA, 2013, pp. 1-9.

DOI

[10]

J. Meng, K. Kawakami, and A. K. Coskun, Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints, in Proc. 49th Annu. Design Automation Conf., San Francisco, CA, USA, 2012, pp. 648-655.

DOI

[11]

T. Cao, S. M. Blackburn, T. J. Gao, and K. S. McKinley, The Yin and Yang of power and performance for asymmetric hardware and managed software, in 2012 39th Annu. Int. Symp. Computer Architecture (ISCA), Portland, OR, USA, 2012, pp. 225-236.

DOI

[12]

N. Gholkar, F. Mueller, and B. Rountree, Power tuning HPC jobs on power-constrained systems, in Proc. 2016 Int. Conf. Parallel Architectures and Compilation, Haifa, Israel, 2016, pp. 179-191.

DOI

[13]

T. Patki, D. K. Lowenthal, A. Sasidharan, M. Maiterth, B. L. Rountree, M. Schulz, and B. R. de Supinski, Practical resource management in power-constrained, high performance computing, in Proc. 24th Int. Symp. High-Performance Parallel and Distributed Computing, Portland, OR, USA, 2015, pp. 121-132.

DOI

[14]

C. Isci, A. Buyuktosunoglu, C. Y. Cher, P. Bose, and M. Martonosi, An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget, in 2006 39th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO’06), Orlando, FL, USA, 2006, pp. 347-358.

DOI

[15]

S. Pagani, J. J. Chen, and M. M. Li, Energy efficiency on multi-core architectures with multiple voltage islands, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 6, pp. 1608-1621, 2015.

DOI Google Scholar

[16]

S. W. Williams, A. Waterman, and D. A. Patterson, Roofline: An insightful visual performance model for multicore architectures, Communications of the ACM, vol. 52, no. 4, pp. 65-76, 2009.

DOI Google Scholar

[17]

K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick, The Landscape of Parallel Computing Research: A View from Berkeley, Electrical Engineering and Computer Sciences, Tech. Rep. UCB/EECS-2006-183, University of California at Berkeley, Berkeley, CA, USA, 2006.

[18]

P. R. Luszczek, D. H. Bailey, J. J. Dongarra, J. Kepner, R. F. Lucas, R. Rabenseifner, and D. Takahashi, The HPC Challenge (HPCC) benchmark suite, in Proc. 2006 ACM/IEEE Conf. Supercomputing (SC’06), Tampa, FL, USA, 2006, p. 213.

DOI

[19]

R. Jeffrey, Intel^® VTune^TM Amplifier, https://software.intel.com/en-us/articles/intel-system-studio-intel-vtune-amplifier-platform-profiler-overview, 2018.

[20]

M. Dimitrov, Intel^® Power Governor, https://software.intel.com/en-us/articles/intel-power-governor, 2012.

[21]

V. Viswanathan, Intel^® Memory Latency Checker v3.8, https://software.intel.com/en-us/articles/intelr-memory-latency-checker, 2013.

[22]

B. Rountree, D. K. Lowenthal, B. R. de Supinski, M. Schulz, V. W. Freeh, and T. Bletsch, Adagio: Making DVS practical for complex HPC applications, in Proc. 23rd Int. Conf. Supercomputing, New York, NY, USA, 2009, pp. 460-469.

DOI

[23]

W. Wang, A. Porterfield, J. Cavazos, and S. Bhalachandra, Using per-loop CPU clock modulation for energy efficiency in OpenMP applications, presented at the 2015 44th Int. Conf. Parallel Processing, Beijing, China, 2015, pp. 629-638.

DOI

[24]

S. Bhalachandra, A. Porterfield, S. L. Olivier, and J. F. Prins, An adaptive core-specific runtime for energy efficiency, peesented at 2017 IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Orlando, FL, USA, 2017, pp. 947-956.

DOI

[25]

I. Stamelakos, S. Xydis, G. Palermo, and C. Silvano, Variation-aware voltage island formation for power efficient near-threshold manycore architectures, presented at the 2014 19th Asia and South Pacific Design Automation Conf. (ASP-DAC), Singapore, 2014, pp. 304-310.

DOI

[26]

U. R. Karpuzcu, A. Sinkar, N. S. Kim, and J. Torrellas, EnergySmart: Toward energy-efficient manycores for near-threshold computing, presented at 2013 IEEE 19th Int. Symp. High Performance Computer Architecture (HPCA), Shenzhen, China, 2013, pp. 542-553.

DOI

[27]

R. Begum, D. Werner, M. Hempstead, G. Prasad, and G. Challen, Energy-performance trade-offs on energy-constrained devices with multi-component DVFS, presented at 2015 IEEE Int. Symp. Workload Characterization, Atlanta, GA, USA, 2015, pp. 34-43.

DOI

[28]

Q. X. Liu, M. Moreto, J. Abella, F. J. Cazorla, and M. Valero, DReAM: An approach to estimate per-task DRAM energy in multicore systems, ACM Transactions on Design Automation of Electronic Systems, vol. 22, no. 1, p. 16, 2016.

DOI Google Scholar

[29]

A. Tiwari, M. Schulz, and L. Carrington, Predicting optimal power allocation for CPU and DRAM domains, in 2015 IEEE Int. Parallel and Distributed Processing Symp. Workshop, Hyderabad, India, 2015, pp. 951-959.

DOI

[30]

H. Z. Zhang and H. Hoffmann, Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques, in Proc. 21st Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS’16), Atlanta, GA, USA, 2016, pp. 545-559.

DOI

[31]

P. F. Zou, T. Allen, C. H. Davis, X. Z. Feng, and R. Ge, CLIP: Cluster-level intelligent power coordination for power-bounded systems, presented at the 2017 IEEE Int. Conf. Cluster Computing (CLUSTER), Honolulu, HI, USA, 2017, pp. 541-551.

DOI

[32]

T. Patki, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski, Exploring hardware overprovisioning in power-constrained, high performance computing, in Proc. 27th Int. ACM Conf. Int. Conf. Supercomputing (ICS’13 ), Eugene, OR, USA, 2013, pp. 173-182.

DOI

[33]

D. Lo and C. Kozyrakis, Dynamic management of TurboMode in modern multi-core chips, presented at 2014 IEEE 20th Int. Symp. High Performance Computer Architecture (HPCA), Orlando, FL, USA, 2014, pp. 603-613.

DOI

[34]

H. B. Jang, J. Lee, J. Kong, T. Suh, and S. W. Chung, Leveraging process variation for performance and energy: In the perspective of overclocking, IEEE Transactions on Computers, vol. 63, no. 5, pp. 1316-1322, 2014.

DOI Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 25 March 2020

Accepted: 02 April 2020

Published: 12 October 2020

Issue date: June 2021

Copyright

Acknowledgements

This work was supported in part by the Advanced Research Project of China (No. 31511010203) and the Research Program of NUDT (No. ZK18-03-10).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).