References(32)
[1]
V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, P. B. Gibbons, M. A. Kozuch, et al., Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization, in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, Davis, CA, USA, 2013, pp. 185-197.
[2]
O. Seongil, Y. H. Son, N. S. Kim, and J. H. Ahn, Row-buffer decoupling: A case for low-latency dram microarchitecture, in Proceedings of ACM/IEEE 41st International Symposium on Computer Architecture, Minneapolis, MN, USA, 2014, pp. 337-348.
[3]
D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, Tiered-latency dram: A low latency and low cost dram architecture, in Proceedings of IEEE 19th International Symposium on High Performance Computer Architecture, Shenzhen, China, 2013, pp. 615-626.
[4]
J. Stuecheli, D. Kaseridis, H. C Hunter, and L. K. John, Elastic refresh: Techniques to mitigate refresh penalties in high density memory, in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, Atlanta, GA, USA, 2010, pp. 375-384.
[5]
P. Nair, C.-C. Chou, and M. K. Qureshi, A case for refresh pausing in DRAM memory systems, in Proceedings of IEEE 19th International Symposium on High Performance Computer Architecture, Shenzhen, China, 2013, pp. 627-638.
[6]
P. Huang, W. Liu, K. Tang, X. He, and K. Zhou, Rop: Alleviating refresh overheads via reviving the memory system in frozen cycles, in Proceedings of 45th International Conference on Parallel Processing, Philadelphia, PA, USA, 2016, pp. 169-178.
[7]
W. Liu, P. Huang, K. Tang, K. Zhou, and X. He, CAR: A compression-aware refresh approach to improve memory performance and energy efficiency, ACM SIGMETRICS Performance Evaluation Review, vol. 44, no. 1, pp. 373-374, 2016.
[8]
H. Ha, A. Pedram, S. Richardson, S. Kvatinsky, and M. Horowitz, Improving energy efficiency of DRAM by exploiting half page row access, in Proceedings of 49th Annual IEEE/ACM International Symposium on Microarchitecture, Taipei, China, 2016, pp. 1-12.
[9]
Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter, Thread cluster memory scheduling: Exploiting differences in memory access behavior, in Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, Atlanta, GA, USA, 2010, pp. 65-76.
[10]
J. H. Ahn, N. P. Jouppi, C. Kozyrakis, J. Leverich, and R. S. Schreiber, Improving system energy efficiency with memory rank subsetting, ACM Transactions on Architecture and Code Optimization, vol. 9, no. 1, p. 4, 2012.
[11]
J. H. Ahn, J. Leverich, R. Schreiber, and N. P. Jouppi, Multicore DIMM: An energy efficient memory module with independently controlled DRAMs, IEEE Computer Architecture Letters, vol. 8, no. 1, pp. 5-8, 2008.
[12]
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, Memory access scheduling, in Proceedings of ACM/IEEE 27th International Symposium on Computer Architecture, Vancouver, Canada, 2000, pp. 128-138.
[13]
D. Kaseridis, J. Stuecheli, and L. K. John, Minimalist open-page: A DRAM page-mode scheduling policy for the many-core era, in Proceedings of 44th Annual IEEE/ACM International Symposium on Microarchitecture, Porto Alegre, Brazil, 2011, pp. 24-35.
[14]
O. Mutlu, Memory scaling: A systems architecture perspective, in Proceedings of 5th IEEE International Memory Workshop, Monterey, CA, USA, pp. 21-25.
[16]
J. Shao and B. T. Davis, A burst scheduling access reordering mechanism, in Proceedings of IEEE 13th International Symposium on High Performance Computer Architecture, Phoenix, AZ, USA, 2007, pp. 285-294.
[17]
K. Sudan, N. Chatterjee, D. Nellans, M. Awasthi, R. Balasubramonian, and A. Davis, Micro-pages: Increasing DRAM efficiency with locality-aware data placement, in Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, Pittsburgh, PA, USA, 2010, pp. 219-230.
[18]
V. Seshadri, T. Mullins, A. Boroumand, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry, Gather-scatter dram: In-DRAM address translation to improve the spatial locality of non-unit strided accesses, in Proceedings of the 48th International Symposium on Microarchitecture, Waikiki, HI, USA, 2015, pp. 267-280.
[19]
P. Rosenfeld, E. Cooper-Balis, and B. Jacob, DRAMSim2: A cycle accurate memory system simulator, IEEE Computer Architecture Letters, vol. 10, no. 1, pp. 16-19, 2011.
[20]
D. Sanchez and C. Kozyrakis, Zsim: Fast and accurate microarchitectural simulation of thousand-core systems, ACM SIGARCH Computer Architecture News, vol. 41, no. 3, pp. 475-486, 2013.
[21]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood, Pin: Building customized program analysis tools with dynamic instrumentation, in Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, Chicago, IL, USA, 2005, pp. 190-200.
[22]
P. Shivakumar and N. P. Jouppi, Cacti 3.0: An integrated cache timing, power, and area model, Report, WRL, 2001.
[24]
V. Young, P. J. Nair, and M. K. Qureshi, Dice: Compressing dram caches for bandwidth and capacity, in Proceedings of ACM/IEEE 44th Annual International Symposium on Computer Architecture, Toronto, Canada, 2017, pp. 627-638.
[25]
M. Bakhshalipour, M. Shakerinava, P. Lotfi-Kamran, and H. Sarbazi-Azad, Bingo spatial data prefetcher, in Proceedings of IEEE International Symposium on High Performance Computer Architecture, Washington, DC, USA, 2019, pp. 399-411.
[26]
K. K.-W. Chang, D. Lee, Z. Chishti, A. R. Alameldeen, C. Wilkerson, Y. Kim, and O. Mutlu, Improving DRAM performance by parallelizing refreshes with accesses, in Proceedings of IEEE 20th International Symposium on High Performance Computer Architecture, Orlando, FL, USA, 2014, pp. 356-367.
[27]
T. Zhang, M. Poremba, C. Xu, G. Sun, and Y. Xie, CREAM: A concurrent-refresh-aware DRAM memory architecture, in Proceedings of IEEE 20th International Symposium on High Performance Computer Architecture, Orlando, FL, USA, 2014, pp. 368-379.
[28]
Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, A case for exploiting subarray-level parallelism in DRAM, ACM SIGARCH Computer Architecture News, vol. 40, no. 3, pp. 368-379, 2012.
[29]
M. K. Jeong, D. H. Yoon, D. Sunwoo, M. Sullivan, I. Lee, and M. Erez, Balancing DRAM locality and parallelism in shared memory cmp systems, in Proceedings of IEEE International Symposium on High-Performance Comp Architecture, New Orleans, LA, USA, 2012, pp. 1-12.
[30]
W. Liu, P. Huang, T. Kun, T. Lu, K. Zhou, C. Li, and X. He, LAMS: A latency-aware memory scheduling policy for modern dram systems, in Proceedings of IEEE 35th International Performance Computing and Communications Conference, Las Vegas, NV, USA, 2016, pp. 1-8.
[31]
T. Zhang, K. Chen, C. Xu, G. Sun, T. Wang, and Y. Xie, Half-DRAM: A high-bandwidth and low-power DRAM architecture from the rethinking of fine-grained activation, in Proceedings of the 41st International Symposium on Computer Architecture, Minneapolis, MN, USA, 2014, pp. 349-360.
[32]
H. Yoon, J. Meza, R. Ausavarungnirun, R. A. Harding, and O. Mutlu, Row buffer locality aware caching policies for hybrid memories, in Proceedings of IEEE 30th International Conference on Computer Design, Montreal, Canada, 2012, pp. 337-344.