AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.8 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

A Holistic Energy-Efficient Approach for a Processor-Memory System

Feihao WuJuan Chen( )Yong DongWenxu ZhengXiaodong PanYuan YuanZhixin OuYuyang Sun
College of Computer, National University of Defense Technology,Changsha 410073, China.
Show Author Information

Abstract

Component overclocking is an effective approach to speed up the components of a system to realize a higher program performance; it includes processor overclocking or memory overclocking. However, overclocking will unavoidably result in increase in power consumption. Our goal is to optimally improve the performance of scientific computing applications without increasing the total power consumption for a processor-memory system. We built a processor-memory energy efficiency model for multicore-based systems, which coordinates the performance and power of processor and memory. Our model exploits performance boost opportunities for a processor-memory system by adopting processor overclocking, processor Dynamic Voltage and Frequency Scaling (DVFS), memory active ratio adjustment, and memory overclocking, according to different scientific applications. This model also provides a total power control method by considering the same four factors mentioned above. We propose a processor and memory Coordination-based holistic Energy-Efficient (CEE) algorithm, which achieves performance improvement without increasing the total power consumption. The experimental results show that an average of 9.3% performance improvement was obtained for all 14 benchmarks. Meanwhile the total power consumption does not increase. The maximal performance improvement was up to 13.1% from dedup benchmark. Our experiments validate the effectiveness of our holistic energy-efficient model and technology.

References

[1]
H. B., Jang J., Lee J., Kong T. Suh, and S. W. Chung, Leveraging process variation for performance and energy: In the perspective of overclocking, IEEE Transactions on Computers, vol. 63, no. 5, pp. 1316-1322, 2014.
[2]
A. Subcommittee, Top ten exascale research challenges, Report, US Department Of Energy, USA, 2014.
[3]
W., Wang A., Porterfield J. Cavazos, and S. Bhalachandra, Using per-loop CPU clock modulation for energy efficiency in openmp applications, in Proc. 44th Int. International Conference Parallel Processing, Beijing, China, 2015, pp. 629-638.
[4]
L., Tan S. L., Song P., Wu Z., Chen R. Ge, and D. J. Kerbyson, Investigating the interplay between energy efficiency and resilience in high performance computing, in Proc. 29th Int. Parallel and Distributed Processing Symposium, Hyderabad, India, 2015, pp. 786-796.
[5]
S., Rivoire M. A., Shah P. Ranganathan, and C. Kozyrakis, Joulesort: A balanced energy-efficiency benchmark, in Proc. 26th Int. Special Interest Group On Management of Data, Beijing, China, 2007, pp. 365-376.
[6]
A., Rasmussen G., Porter M., Conley H. V., Madhyastha R. N., Mysore A. Pucher, and A. Vahdat, Tritonsort: A balanced large-scale sorting system, in Proc. 8th Int. Usenix Conference on Networked Systems Design & Implementation, Boston, MA, USA, 2011, pp. 1-28.
[7]
D. G., Andersen J., Franklin M., Kaminsky A., Phanishayee L. Tan, and V. Vasudevan, Fawn: A fast array of wimpy nodes, in Proc. 22nd Int. Acm Symposium on Operating Systems Principles, Montana, MT, USA, 2009, pp. 1-14.
[8]
A., Tiwari M. Schulz, and L. Carrington, Predicting optimal power allocation for cpu and dram domains, in Proc. 29th Int. Parallel and Distributed Processing Symposium Workshop (IPDPSW), Hyderabad, India, 2015, pp. 951-959.
[9]
H. Zhang and H. Hoffmann, Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques, ACM SIGPLAN Notices, vol. 51, no. 4, pp. 545-559, 2016.
[10]
R., Ge X., Feng Y. He, and P. Zou, The case for cross-component power coordination on power bounded systems, in Proc. 45th Int. International Conference on Parallel Processing (ICPP), Philadelphia, PA, USA, 2016, pp. 516-525.
[11]
M., Chen X. Wang, and X. Li, Coordinating processor and main memory for efficientserver power control, in Proc. 25th Int. International Conference on Supercomputing (ICS), Arizona, AZ, USA, pp. 130-140.
[12]
Q., Deng D., Meisner A., Bhattacharjee T. F. Wenisch, and R. Bianchini, CoScale: Coordinating CPU and memory system DVFS in server systems, in Proc. 45th Int. International Symposium on Microarchitecture (MICRO), Canada, 2012, pp. 143-154.
[13]
J., Rubio K., Rajamani F., Rawson H., Hanson S. Ghiasi, and T. Keller, Dynamic processor overclocking for improving performance of power-constrained systems, Report, IBM, 2005.
[14]
A. D. M. Akhshabi1, Overclocking of CPU and graphics cards cooling refrigerator models offer the xtreme (permanent use) in order to increase efficiency, Bulletin of Applied and Research Science, vol. 3, no. 3, pp. 44-50, 2013.
[15]
C., Bienia S., Kumar J. P. Singh, and K. Li, The parsec benchmark suite: Characterization and architectural implications, in Proc. 17th Int. International Conference on Parallel Architectures and Compilation Techniques, Raleigh, NC, USA, 2008, pp. 72-81.
[16]
P. R., Luszczek D. H., Bailey J. J., Dongarra J., Kepner R. F., Lucas R. Rabenseifner, and D. Takahashi, The HPC challenge (HPCC) benchmark suite, in Proc. 19th Int. ACM/IEEE Conference on Supercomputing, Tampa, SF, USA, 2006, pp. 213-213.
[17]
Intel 64 and IA-32 Architectures Software Developers Manual, Intel Corporation, 2014.
[18]
D. James, How to overclock: It’s easier than you think, https://www.pcgamesn.com/hardware-guides/overclocking-guide-how-to-overclock, 2017.
[20]
D. Lo and C. Kozyrakis, Dynamic management of turbomode in modern multi-core chips, in Proc. 20th Int. High Performance Computer Architecture (HPCA), Florida, FL, USA, 2014, pp. 603-613.
[22]
M. Dimitrov, Intel power governor, https://software.intel.com/en-us/articles/intel-power-governor, 2012.
[23]
V. Viswanathan, Intel Memory Latency Checker v3.4, https://software.intel.com/en-us/articles/intelr-memory-latency-checker, 2017.
[24]
C., Lefurgy X. Wang, and M. Ware, Power capping: A prelude to power shifting, Cluster Computing, vol. 11, no. 2, pp. 183-195, 2008.
[25]
R., Raghavendra P., Ranganathan V., Talwar Z. Wang, and X. Zhu, No power struggles: Coordinated multi-level power management for the data center, in Proc. 13rd Int. International Conference on Architectural Support for Programming Languages and Operating Systems, Seattle, WA, USA, 2008, pp. 48-59.
[26]
X., Yang Y., Zhang X., Lu J., Xue I., Rogers G., Li G. Wang, and X. Fang, Exploiting the reuse supplied by loop-dependent stream references for stream processors, ACM Transactions on Architecture and Code Optimization, vol. 7, no. 2, pp. 1-35, 2010.
[27]
X., Yang Z., Wang J. Xue, and Y. Zhou, The reliability wall for exascale supercomputing, IEEE Transactions on Computers, vol. 61, no. 6, pp. 767-779, 2012.
[28]
B., Rountree D. K., Lownenthal B. R. de, Supinski M., Schulz V. W. Freeh, and T. Bletsch, Adagio: Making DVS practical for complex HPC applications, in Proc. 23rd Int. International Conference on Supercomputing, Yorktown Heights, NY, USA, 2009, pp. 460-469.
[29]
S., Bhalachandra A., Porterfield S. L. Olivier, and J. F. Prins, An adaptive core-specific runtime for energy efficiency, in Proc. 31s Int. IEEE International Parallel and Distributed Processing Symposium, Florida, FL, USA, 2017, pp. 947-956.
[30]
A., Marathe P. E., Bailey D. K., Lowenthal B., Rountree M. Schulz, and B. R. de Supinski, A run-time system for power-constrained hpc applications, in Proc. 31s Int. High Performance Computing, Bengaluru, Indian, 2015, pp. 394-408.
[31]
I., Stamelakos S., Xydis G. Palermo, and C. Silvano, Variation-aware voltage island formation for power efficient near-threshold manycore architectures, in Proc. 19th Int. Asia and South Pacific Design Automation Conference, Singapore, 2014, pp. 304-310.
[32]
U. R., Karpuzcu A., Sinkar N. S. Kim, and J. Torrellas, Energysmart: Toward energy-efficient manycores for near-threshold computing, in Proc. 19th Int. High Performance Computer Architecture, Shenzhen, China, 2013, pp. 542-553.
[33]
R., Begum D., Werner M., Hempstead G. Prasad, and G. Challen, Energy-performance trade-offs on energy-constrained devices with multi-component DVFS, in Proc. 10th Int. International Symposium on Workload Characterization, Georgia, GA, USA, 2015, pp. 34-43.
[34]
S. Mittal, A survey of architectural techniques for DRAM power management, International Journal of High Performance Systems Architecture, vol. 4, no. 2, pp. 110-119, 2012.
[35]
Q., Liu M., Moreto J., Abella F. J. Cazorla, and M. Valero, Dream: Per-task DRAM energy metering in multicore systems, in Proc. 20th Int. European Conference on Parallel Processing, Porto, Portugal, 2014, pp. 111-123.
[36]
Q. Deng, Active low-power modes for main memory with memscale, IEEE Micro, vol. 32, no. 3, pp. 62-69, 2012.
[37]
P., Zou T., Allen C. H. Davis, IV X. Feng, and R. Ge, Clip: Cluster-level intelligent power coordination for power-bounded systems, in Proc. 20th Int. Cluster Computing, Hawaii, HI, USA, 2017, pp. 541-551.
[38]
R., Ge P. Zou, and X. Feng, Application-aware power coordination on power bounded NUMA multicore systems, in Proc. 46th Int. International Conference on Parallel Processing, Briston, UK, 2017, pp. 591-600.
[39]
B. Acun and L. V. Kale, Mitigating processor variation through dynamic load balancings, in Proc. 30th Int. International Parallel and Distributed Processing Symposium Workshops, Chicago, IL, USA, 2016, pp. 1073-1076.
[40]
T., Patki D. K., Lowenthal B., Rountree M. Schulz, and B. R. de Supinski, Exploring hardware overprovisioning in power-constrained, high performance computing, in Proc. 27th Int. International Conference on Supercomputing, Eugene, OR, USA, 2013, pp. 173-182.
Tsinghua Science and Technology
Pages 468-483
Cite this article:
Wu F, Chen J, Dong Y, et al. A Holistic Energy-Efficient Approach for a Processor-Memory System. Tsinghua Science and Technology, 2019, 24(4): 468-483. https://doi.org/10.26599/TST.2018.9020104

745

Views

26

Downloads

12

Crossref

N/A

Web of Science

14

Scopus

0

CSCD

Altmetrics

Received: 17 May 2018
Accepted: 15 June 2018
Published: 07 March 2019
© The author(s) 2019
Return