AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Approximate Similarity-Aware Compression for Non-Volatile Main Memory

Wuhan National Laboratory for Optoelectronics, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

A preliminary version of this paper was published in the Proceedings of DAC 2020.

Show Author Information

Abstract

Image bitmaps, i.e., data containing pixels and visual perception, have been widely used in emerging applications for pixel operations while consuming lots of memory space and energy. Compared with legacy DRAM (dynamic random access memory), non-volatile memories (NVMs) are suitable for bitmap storage due to the salient features of high density and intrinsic durability. However, writing NVMs suffers from higher energy consumption and latency compared with read accesses. Existing precise or approximate compression schemes in NVM controllers show limited performance for bitmaps due to the irregular data patterns and variance in bitmaps. We observe the pixel-level similarity when writing bitmaps due to the analogous contents in adjacent pixels. By exploiting the pixel-level similarity, we propose SimCom, an approximate similarity-aware compression scheme in the NVM module controller, to efficiently compress data for each write access on-the-fly. The idea behind SimCom is to compress continuous similar words into the pairs of base words with runs. The storage costs for small runs are further mitigated by reusing the least significant bits of base words. SimCom adaptively selects an appropriate compression mode for various bitmap formats, thus achieving an efficient trade-off between quality and memory performance. We implement SimCom on GEM5/zsim with NVMain and evaluate the performance with real-world image/video workloads. Our results demonstrate the efficacy and efficiency of our SimCom with an efficient quality-performance trade-off.

Electronic Supplementary Material

Download File(s)
JCST-2206-12565-Highlights.pdf (162.7 KB)

References

[1]
Zhao H Y, Xue L N, Chi P, Zhao J S. Approximate image storage with multi-level cell STT-MRAM main memory. In Proc. the 2017 IEEE/ACM International Conference on Computer-Aided Design, Nov. 2017, pp.268–275. DOI: 10.1109/ICCAD.2017.8203788.
[2]

Wallace G K. The JPEG still picture compression standard. Communications of the ACM , 1991, 34(4): 30–44. DOI: 10.1145/103085.103089.

[3]

Yazdanbakhsh A, Mahajan D, Esmaeilzadeh H, Lotfi-Kamran P. AxBench: A multiplatform benchmark suite for approximate computing. IEEE Design & Test , 2017, 34(2): 60–68. DOI: 10.1109/MDAT.2016.2630270.

[4]

Xia F, Jiang D J, Xiong J, Sun N H. A survey of phase change memory systems. Journal of Computer Science and Technology , 2015, 30(1): 121–144. DOI: 10.1007/s11390- 015-1509-2.

[5]

Wong H S P, Raoux S, Kim S, Liang J L, Reifenberg J P, Rajendran B, Asheghi M, Goodson K E. Phase change memory. Proceedings of the IEEE , 2010, 98(12): 2201–2227. DOI: 10.1109/JPROC.2010.2070050.

[6]

Wong H S P, Lee H Y, Yu S M, Chen Y S, Wu Y, Chen P S, Lee B, Chen F T, Tsai M J. Metal-oxide RRAM. Proceedings of the IEEE , 2012, 100(6): 1951–1970. DOI: 10.1109/JPROC.2012.2190369.

[7]

Liu H K, Chen D, Jin H, Liao X F, He B S, Hu K, Zhang Y. A survey of non-volatile main memory technologies: State-of-the-arts, practices, and future directions. Journal of Computer Science and Technology , 2021, 36(1): 4–32. DOI: 10.1007/s11390-020-0780-z.

[8]
Bittman D, Alvaro P, Long D D E, Miller E L. Optimizing systems for byte-addressable NVM by reducing bit flipping. In Proc. the 17th USENIX Conference on File and Storage Technologies, Feb. 2019, pp.17–30.
[9]
Li Z Q, Zhou R J, Li T. Exploring high-performance and energy proportional interface for phase change memory systems. In Proc. the 19th IEEE International Symposium on High Performance Computer Architecture, Feb. 2013, pp.210–221. DOI: 10.1109/HPCA.2013.6522320.
[10]
Yue J H, Zhu Y F. Accelerating write by exploiting PCM asymmetries. In Proc. the 19th IEEE International Symposium on High Performance Computer Architecture, Feb. 2013, pp.282–293. DOI: 10.1109/HPCA.2013.6522326.
[11]
Zuo P F, Hua Y, Wu J. Write-optimized and high-performance hashing index scheme for persistent memory. In Proc. the 13th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2018, pp.461–476.
[12]
Xu J, Zhang L, Memaripour A, Gangadharaiah A, Borase A, Da Silva T B, Swanson S, Rudoff A. NOVA-Fortis: A fault-tolerant non-volatile main memory file system. In Proc. the 26th Symposium on Operating Systems Principles, Oct. 2017, pp.478–496. DOI: 10.1145/3132747.3132761.
[13]
Hong S, Nair P J, Abali B, Buyuktosunoglu A, Kim K H, Healy M. Attaché: Towards ideal memory compression by mitigating metadata bandwidth overheads. In Proc. the 51st Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2018, pp.326–338. DOI: 10.1109/MICRO.2018.00034.
[14]
Palangappa P M, Mohanram K. CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM. In Proc. the 2016 IEEE International Symposium on High Performance Computer Architecture, Mar. 2016, pp.90–101. DOI: 10.1109/HPCA.2016.7446056.
[15]
Pekhimenko G, Seshadri V, Mutlu O, Gibbons P B, Kozuch M A, Mowry T C. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proc. the 21st International Conference on Parallel Architectures and Compilation Techniques, Sept. 2012, pp.377–388. DOI: 10.1145/2370816.2370870.
[16]
Dgien D B, Palangappa P M, Hunter N A, Li J Y, Mohanram K. Compression architecture for bit-write reduction in non-volatile memory technologies. In Proc. the 2014 IEEE/ACM International Symposium on Nanoscale Architectures, Jul. 2014, pp.51–56. DOI: 10.1109/NANOARCH.2014.6880482.
[17]
Guo Q, Strauss K, Ceze L, Malvar H S. High-density image storage using approximate memory cells. In Proc. the 21st International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2016, pp.413–426. DOI: 10.1145/2872362.2872413.
[18]
San Miguel J, Albericio J, Jerger N E, Jaleel A. The bunker cache for spatio-value approximation. In Proc. the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2016, Article No. 43. DOI: 10.1109/MICRO.2016.7783746.
[19]
Shin S, Tirukkovalluri S K, Tuck J, Solihin Y. Proteus: A flexible and fast software supported hardware logging approach for NVM. In Proc. the 50th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2017, pp.178–190. DOI: 10.1145/3123939.3124539.
[20]
Ranjan A, Raha A, Raghunathan V, Raghunathan A. Approximate memory compression for energy-efficiency. In Proc. the 2017 IEEE/ACM International Symposium on Low Power Electronics and Design, Jul. 2017. DOI: 10.1109/ISLPED.2017.8009173.
[21]
Chen Z Y, Hua Y, Zuo P F, Sun Y Y, Guo Y C. Reducing bit writes in non-volatile main memory by similarity-aware compression. In Proc. the 57th ACM/IEEE Design Automation Conference, Jul. 2020. DOI: 10.1109/DAC18072.2020.9218683.
[22]
Porter T K, Duff T. Compositing digital images. In Proc. the 11th Annual Conference on Computer Graphics and Interactive Techniques, Jan. 1984, pp.253–259. DOI: 10.1145/800031.808606.
[23]

Duff T. Deep compositing using lie algebras. ACM Trans. Graphics , 2017, 36(3): Article No. 26. DOI: 10.1145/3023386.

[24]
Yang B D, Lee J E, Kim J S, Cho J, Lee S Y, Yu B G. A low power phase-change random access memory using a data-comparison write scheme. In Proc. the 2007 International Symposium on Circuits and Systems, May 2007, pp.3014–3017. DOI: 10.1109/ISCAS.2007.377981.
[25]
Cho S, Lee H. Flip-N-write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In Proc. the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2009, pp.347–357. DOI: 10.1145/1669112.1669157.
[26]
Xu J, Feng D, Hua Y, Tong W, Liu J N, Li C Y. Extending the lifetime of NVMs with compression. In Proc. the 2018 Design, Automation & Test in Europe Conference & Exhibition, Mar. 2018, pp.1604–1609. DOI: 10.23919/DATE.2018.8342271.
[27]
Guo Y C, Hua Y, Zuo P F. DFPC: A dynamic frequent pattern compression scheme in NVM-based main memory. In Proc. the 2018 Design, Automation & Test in Europe Conference & Exhibition, Mar. 2018, pp.1622–1627. DOI: 10.23919/DATE.2018.8342274.
[28]
Palangappa P M, Mohanram K. CASTLE: Compression architecture for secure low latency, low energy, high endurance NVMs. In Proc. the 55th Annual Design Automation Conference, Jun. 2018, Article No. 87. DOI: 10.1145/3195970.3196007.
[29]
Jacobvitz A N, Calderbank R, Sorin D J. Coset coding to extend the lifetime of memory. In Proc. the 19th IEEE International Symposium on High Performance Computer Architecture, Feb. 2013, pp.222–233. DOI: 10.1109/HPCA.2013.6522321.
[30]
Liu S, Pattabiraman K, Moscibroda T, Zorn B G. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proc. the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.213–224. DOI: 10.1145/1950365.1950391.
[31]
Sampson A, Dietl W, Fortuna E, Gnanapragasam D, Ceze L, Grossman D. EnerJ: Approximate data types for safe and general low-power computation. In Proc. the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 2011, pp.164–174. DOI: 10.1145/1993498.1993518.
[32]
Sampson A, Baixo A, Ransford B, Moreau T, Yip J, Ceze L, Oskin M. ACCEPT: A programmer-guided compiler framework for practical approximate computing. Technical Report UW-CSE-15-01, University of Washington, 2015. https://dada.cs.washington.edu/research/tr/2015/01/UW-CSE-15-01-01.pdf, Jan. 2024.
[33]
San Miguel J, Albericio J, Moshovos A, Jerger N E. Doppelgänger: A cache for approximate computing. In Proc. the 48th International Symposium on Microarchitecture, Dec. 2015, pp.50–61. DOI: 10.1145/2830772.2830790.
[34]
Ranjan A, Venkataramani S, Pajouhi Z, Venkatesan R, Roy K, Raghunathan A. STAxCache: An approximate, energy efficient STT-MRAM cache. In Proc. the 2017 Design, Automation & Test in Europe Conference & Exhibition, Mar. 2017, pp.356–361. DOI: 10.23919/DATE.2017.7927016.
[35]
Jevdjic D, Strauss K, Ceze L, Malvar H S. Approximate storage of compressed and encrypted videos. In Proc. the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2017, pp.361–373. DOI: 10.1145/3037697.3037718.
[36]
Esmaeilzadeh H, Sampson A, Ceze L, Burger D. Architecture support for disciplined approximate programming. In Proc. the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2012, pp.301–312. DOI: 10.1145/2150976.2151008.
[37]
Sampson A, Nelson J, Strauss K, Ceze L. Approximate storage in solid-state memories. In Proc. the 46th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2013, pp.25–36. DOI: 10.1145/2540708.2540712.
[38]
Baek W, Chilimbi T M. Green: A framework for supporting energy-conscious programming using controlled approximation. In Proc. the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 2010, pp.198–209. DOI: 10.1145/1806596.1806620.
[39]
Samadi M, Jamshidi D A, Lee J, Mahlke S. Paraprox: Pattern-based approximation for data parallel applications. In Proc. the 19th Architectural Support for Programming Languages and Operating Systems, Feb. 2014, pp.35–50. DOI: 10.1145/2541940.2541948.
[40]
Sui X, Lenharth A, Fussell D S, Pingali K. Proactive control of approximate programs. In Proc. the 21st Architectural Support for Programming Languages and Operating Systems, Mar. 2016, pp.607–621. DOI: 10.1145/2872362.2872402.
[41]
Laurenzano M A, Hill P, Samadi M, Mahlke S, Mars J, Tang L J. Input responsiveness: Using canary inputs to dynamically steer approximation. In Proc. the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, Jun. 2016, pp.161–176. DOI: 10.1145/2908080.2908087.
[42]
Xu R, Koo J, Kumar R, Bai P, Mitra S, Misailovic S, Bagchi S. VideoChef: Efficient approximation for streaming video processing pipelines. In Proc. the 2018 USENIX Annual Technical Conference, Jul. 2018, pp.43–55.
[43]
Judd D B. Color in Business, Science, and Industry (3rd edition). Wiley-Interscience, 1975.
[44]
Leong J. Number of colors distinguishable by the human eye. In Color, Wyszecki G (ed.), World Book Inc., 2006, p.824.
[45]
Young V, Kariyappa S, Qureshi M K. Enabling transparent memory-compression for commodity memory systems. In Proc. the 25th IEEE International Symposium on High Performance Computer Architecture, Feb. 2019, pp.570–581. DOI: 10.1109/HPCA.2019.00010.
[46]

Binkert N, Beckmann B, Black G, Reinhardt S K, Saidi A, Basu A, Hestness J, Hower D R, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill M D, Wood D A. The gem5 simulator. ACM SIGARCH Computer Architecture News , 2011, 39(2): 1–7. DOI: 10.1145/2024716. 2024718.

[47]
Poremba M, Zhang T, Xie Y. NVMain 2.0: A user-friendly memory simulator to model (non-)volatile memory systems. IEEE Computer Architecture Letters, 2015, 14(2): 140–143. DOI: 10.1109/LCA.2015.2402435.
[48]
Sánchez D, Kozyrakis C. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proc. the 40th Annual International Symposium on Computer Architecture, Jun. 2013, pp.475–486. DOI: 10.1145/2485922.2485963.
[49]
Barker K, Benson T, Campbell D, Ediger D, Gioiosa R, Hoisie A, Kerbyson D, Manzano J, Marquez A, Song L, Tallent N R, Tumeo A. PERFECT (power efficiency revolution for embedded computing technologies) benchmark suite manual. Technical Report, Pacific Northwest National Laboratory and Georgia Tech Research Institute, 2013. https://hpc.pnnl.gov/PERFECT/, Jan. 2024.
[50]
Bienia C, Kumar S, Singh J P, Li K. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report TR-811-08, Princeton University, 2008. https://www.cs.princeton.edu/techreports/2008/811.pdf, Jan. 2024.
[51]

Wang Z, Bovik A C, Sheikh H R, Simoncelli E P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Processing , 2004, 13(4): 600–612. DOI: 10.1109/TIP.2003.819861.

[52]

Palangappa P M, Mohanram K. CompEx++: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVMs. ACM Trans. Architecture and Code Optimization , 2017, 14(1): Article No. 10. DOI: 10.1145/3050440.

[53]
Yan E, Zhang K Y, Wang X, Strauss K, Ceze L. Customizing progressive JPEG for efficient image storage. In Proc. the 9th USENIX Workshop on Hot Topics in Storage and File Systems, Jul. 2017.
[54]
San Miguel J, Badr M, Jerger N E. Load value approximation. In Proc. the 47th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2014, pp.127–139. DOI: 10.1109/MICRO.2014.22.
Journal of Computer Science and Technology
Pages 63-81
Cite this article:
Chen Z-Y, Hua Y, Zuo P-F, et al. Approximate Similarity-Aware Compression for Non-Volatile Main Memory. Journal of Computer Science and Technology, 2024, 39(1): 63-81. https://doi.org/10.1007/s11390-023-2565-7

142

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 14 June 2022
Accepted: 12 February 2023
Published: 25 January 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return