[1]
R. H. Dennard, F. H. Gaensslen, H. N. Yu, V. L. Rideout, E. Bassous, and A. R. LeBlanc, Design of ion-implanted MOSFET’s with very small physical dimensions, IEEE Journal of Solid-State Circuits, vol. 9, no. 5, pp. 256-268, 1974.
[2]
M. Bohr, A 30 year retrospective on Dennard’s MOSFET scaling paper, IEEE Solid-State Circuits Society Newsletter, vol. 12, no. 1, pp. 11-13, 2007.
[3]
R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen, Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction, in Proc. 36th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO 36), San Diego, CA, USA, 2003, pp. 81-92.
[4]
R. Kumar, V. Zyuban, and D. M. Tullsen, Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling, ACM SIGARCH Computer Architecture News, vol. 33, no. 2, pp. 408-419, 2005.
[5]
R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan, Heterogeneous chip multiprocessors, Computer, vol. 38, no. 11, pp. 32-38, 2005.
[6]
T. Heath, B. Diniz, E. V. Carrera, W. Meira, and R. Bianchini, Energy conservation in heterogeneous server clusters, in Proc. 10th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Chicago, IL, USA, 2005, pp. 186-195.
[7]
Y. M. Li, K. Skadron, D. Brooks, and Z. G. Hu, Performance, energy, and thermal considerations for SMT and CMP architectures, in Proc. 11th Int. Symp. High-Performance Computer Architecture, San Francisco, CA, USA, 2005, pp. 71-82.
[8]
A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke, Composite cores: Pushing heterogeneity into a core, in 2012 45th Annu. IEEE/ACM Int. Symp. Microarchitecture, Vancouver, Canada, 2012, pp. 317-328
[9]
T. S. Muthukaruppan, M. Pricopi, V. Venkataramani, T. Mitra, and S. Vishin, Hierarchical power management for asymmetric multi-core in dark silicon era, in 2013 50th ACM/EDAC/IEEE Design Automation Conf. (DAC), Austin, TX, USA, 2013, pp. 1-9.
[10]
J. Meng, K. Kawakami, and A. K. Coskun, Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints, in Proc. 49th Annu. Design Automation Conf., San Francisco, CA, USA, 2012, pp. 648-655.
[11]
T. Cao, S. M. Blackburn, T. J. Gao, and K. S. McKinley, The Yin and Yang of power and performance for asymmetric hardware and managed software, in 2012 39th Annu. Int. Symp. Computer Architecture (ISCA), Portland, OR, USA, 2012, pp. 225-236.
[12]
N. Gholkar, F. Mueller, and B. Rountree, Power tuning HPC jobs on power-constrained systems, in Proc. 2016 Int. Conf. Parallel Architectures and Compilation, Haifa, Israel, 2016, pp. 179-191.
[13]
T. Patki, D. K. Lowenthal, A. Sasidharan, M. Maiterth, B. L. Rountree, M. Schulz, and B. R. de Supinski, Practical resource management in power-constrained, high performance computing, in Proc. 24th Int. Symp. High-Performance Parallel and Distributed Computing, Portland, OR, USA, 2015, pp. 121-132.
[14]
C. Isci, A. Buyuktosunoglu, C. Y. Cher, P. Bose, and M. Martonosi, An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget, in 2006 39th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO’06), Orlando, FL, USA, 2006, pp. 347-358.
[15]
S. Pagani, J. J. Chen, and M. M. Li, Energy efficiency on multi-core architectures with multiple voltage islands, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 6, pp. 1608-1621, 2015.
[16]
S. W. Williams, A. Waterman, and D. A. Patterson, Roofline: An insightful visual performance model for multicore architectures, Communications of the ACM, vol. 52, no. 4, pp. 65-76, 2009.
[17]
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick, The Landscape of Parallel Computing Research: A View from Berkeley, Electrical Engineering and Computer Sciences, Tech. Rep. UCB/EECS-2006-183, University of California at Berkeley, Berkeley, CA, USA, 2006.
[18]
P. R. Luszczek, D. H. Bailey, J. J. Dongarra, J. Kepner, R. F. Lucas, R. Rabenseifner, and D. Takahashi, The HPC Challenge (HPCC) benchmark suite, in Proc. 2006 ACM/IEEE Conf. Supercomputing (SC’06), Tampa, FL, USA, 2006, p. 213.
[22]
B. Rountree, D. K. Lowenthal, B. R. de Supinski, M. Schulz, V. W. Freeh, and T. Bletsch, Adagio: Making DVS practical for complex HPC applications, in Proc. 23rd Int. Conf. Supercomputing, New York, NY, USA, 2009, pp. 460-469.
[23]
W. Wang, A. Porterfield, J. Cavazos, and S. Bhalachandra, Using per-loop CPU clock modulation for energy efficiency in OpenMP applications, presented at the 2015 44th Int. Conf. Parallel Processing, Beijing, China, 2015, pp. 629-638.
[24]
S. Bhalachandra, A. Porterfield, S. L. Olivier, and J. F. Prins, An adaptive core-specific runtime for energy efficiency, peesented at 2017 IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Orlando, FL, USA, 2017, pp. 947-956.
[25]
I. Stamelakos, S. Xydis, G. Palermo, and C. Silvano, Variation-aware voltage island formation for power efficient near-threshold manycore architectures, presented at the 2014 19th Asia and South Pacific Design Automation Conf. (ASP-DAC), Singapore, 2014, pp. 304-310.
[26]
U. R. Karpuzcu, A. Sinkar, N. S. Kim, and J. Torrellas, EnergySmart: Toward energy-efficient manycores for near-threshold computing, presented at 2013 IEEE 19th Int. Symp. High Performance Computer Architecture (HPCA), Shenzhen, China, 2013, pp. 542-553.
[27]
R. Begum, D. Werner, M. Hempstead, G. Prasad, and G. Challen, Energy-performance trade-offs on energy-constrained devices with multi-component DVFS, presented at 2015 IEEE Int. Symp. Workload Characterization, Atlanta, GA, USA, 2015, pp. 34-43.
[28]
Q. X. Liu, M. Moreto, J. Abella, F. J. Cazorla, and M. Valero, DReAM: An approach to estimate per-task DRAM energy in multicore systems, ACM Transactions on Design Automation of Electronic Systems, vol. 22, no. 1, p. 16, 2016.
[29]
A. Tiwari, M. Schulz, and L. Carrington, Predicting optimal power allocation for CPU and DRAM domains, in 2015 IEEE Int. Parallel and Distributed Processing Symp. Workshop, Hyderabad, India, 2015, pp. 951-959.
[30]
H. Z. Zhang and H. Hoffmann, Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques, in Proc. 21st Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS’16), Atlanta, GA, USA, 2016, pp. 545-559.
[31]
P. F. Zou, T. Allen, C. H. Davis, X. Z. Feng, and R. Ge, CLIP: Cluster-level intelligent power coordination for power-bounded systems, presented at the 2017 IEEE Int. Conf. Cluster Computing (CLUSTER), Honolulu, HI, USA, 2017, pp. 541-551.
[32]
T. Patki, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski, Exploring hardware overprovisioning in power-constrained, high performance computing, in Proc. 27th Int. ACM Conf. Int. Conf. Supercomputing (ICS’13 ), Eugene, OR, USA, 2013, pp. 173-182.
[33]
D. Lo and C. Kozyrakis, Dynamic management of TurboMode in modern multi-core chips, presented at 2014 IEEE 20th Int. Symp. High Performance Computer Architecture (HPCA), Orlando, FL, USA, 2014, pp. 603-613.
[34]
H. B. Jang, J. Lee, J. Kong, T. Suh, and S. W. Chung, Leveraging process variation for performance and energy: In the perspective of overclocking, IEEE Transactions on Computers, vol. 63, no. 5, pp. 1316-1322, 2014.