3.2 Algorithm
We propose a processor and memory coordination-based holistic energy-efficient algorithm in Algorithm 1. For a given application, this algorithm obtains nearly optimal
,
,
,
for performance boost without power consumption increase.
Input parameters consist of m-level memory bandwidth
, processor and memory frequency scaling ranges, and memory traffic ratio threshold. Here,
represents the default memory bandwidth in the baseline situation;
represents the minimal memory bandwidth when scaling down as much memory active ratio as possible; and
is the number of memory channels. In our experiment, the number of memory channels equals the number of memory ranks. The element
represents the memory bandwidth when memory active ratio is
.
The outputs for this algorithm include processor and memory frequency scaling values and memory active ratio. Processor frequency is either overclocked or scaled down. Memory frequency is either overclocked or kept unchanged. According to the memory traffic ratio threshold provided by our algorithm, our algorithm achieves near-optimal memory active ratio, which determines an appropriate level from
-level memory bandwidth.
The algorithm is divided into two parts. First, we need to obtain three parameters, memory traffic
,
,
by running an application and conducting some performance profile and power measurements. The initial processor power and memory power are separately measured. Characterizing the applications via profiling will not cause a limitation for our approach, because in a supercomputing center, most scientific computing applications often run multiple times. Even if a profile-based approach consumes a large amount of time on profiling data, we can still benefit from the later process, which is running this program repeatedly.
Second, according to the relationship between
and
, the algorithm is divided into two branches: Steps 2-9 and Steps 11-17. In the former branch, for CPU-intensive applications, Step 3 obtains the optimal memory bandwidth
, and Step 4 obtains the corresponding memory active ratio
. Power saving comes from memory side because memory active ratio is scaled down (Step 7). To satisfy the power constraint condition
, the maximal processor overclocking frequency is calculated in Step 8. Finally, our algorithm outputs the data
and
.
The other branch for the second part is for memory-intensive applications. In Step 12, we find a memory overclocking frequency from
. Step 14 updates the memory power after adopting memory overclocking. Power increase comes from the memory side (Step 15). To satisfy the power constraint condition
, the maximal processor DVFS frequency is calculated in Step 16. Finally, our algorithm outputs the data
and
.
In the following section, we validate the effectiveness of our algorithm via experiments.