Liu X, Chen K, Liu M, et al. Multi-Clock Snapshot Isolation Concurrency Control for NVM Database. Tsinghua Science and Technology, 2022, 27(6): 925-938. https://doi.org/10.26599/TST.2021.9010036
Multi-Clock Snapshot Isolation (MCSI) is a concurrency control mechanism that implements snapshot isolation on a single-layer Non-Volatile Memory (NVM) database. It stores a single copy of data by using multi-version storage to ensure durability and runtime access. With multi-clock transaction timestamp assignment, MCSI can efficiently generate snapshots with vector clocks and use per-thread transaction status arrays to identify uncommitted versions in NVM. For evaluation, we compared MCSI with the PostgreSQL-style concurrency control used in the single-layer NVM database N2DB. The maximum transaction throughput of MCSI is 101%–195% higher than that of N2DB for the YCSB workloads, and 25%–49% higher for the TPC-C workloads. Moreover, the transaction latency of MCSI remains relatively stable as the thread count increases. With 18 worker threads, the average transaction latency of MCSI is 65%–84% lower than that of N2DB for the YCSB workloads and 16%–43% lower for the TPC-C workloads.
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Abstract
Multi-Clock Snapshot Isolation (MCSI) is a concurrency control mechanism that implements snapshot isolation on a single-layer Non-Volatile Memory (NVM) database. It stores a single copy of data by using multi-version storage to ensure durability and runtime access. With multi-clock transaction timestamp assignment, MCSI can efficiently generate snapshots with vector clocks and use per-thread transaction status arrays to identify uncommitted versions in NVM. For evaluation, we compared MCSI with the PostgreSQL-style concurrency control used in the single-layer NVM database N2DB. The maximum transaction throughput of MCSI is 101%–195% higher than that of N2DB for the YCSB workloads, and 25%–49% higher for the TPC-C workloads. Moreover, the transaction latency of MCSI remains relatively stable as the thread count increases. With 18 worker threads, the average transaction latency of MCSI is 65%–84% lower than that of N2DB for the YCSB workloads and 16%–43% lower for the TPC-C workloads.
A. Eisenman, D. Gardner, I. AbdelRahman, J. Axboe, S. Y. Dong, K. Hazelwood, C. Petersen, A. Cidon, and S. Katti, Reducing DRAM footprint with NVM in Facebook, in Proc. 13th EuroSys Conf., Porto, Portugal, 2018, p. 42.
KimJ. H., KimJ., KangH., LeeC. G., ParkS., and KimY., pNOVA: Optimizing shared file I/O operations of NVM file system on manycore servers, in , Hangzhou, China, 2019, pp. 1–7.10.1145/3343737.3343748
DeBrabantJ., ArulrajJ., PavloA., StonebrakerM., ZdonikS., and DulloorS. R., A prolegomenon on OLTP database systems for non-volatile memory, in , Hangzhou, China, 2014, pp. 57–63.
A. van Renen, V. Leis, A. Kemper, T. Neumann, T. Hashida, K. Oe, Y. Doi, L. Harada, and M. Sato, Managing non-volatile memory in database systems, in Proc. 2018 Int. Conf. Management of Data, Houston, TX, USA, 2018, pp. 1541–1555.
R. Fang, H. I. Hsiao, B. He, C. Mohan, and Y. Wang, High performance database logging using storage class memory, in 2011 IEEE 27th Int. Conf. Data Engineering, Hannover, Germany, 2011, pp. 1221–1231.
S. Gao, J. L. Xu, B. S. He, B. Choi, and H. B. Hu, PCMLogging: Reducing transaction logging overhead with PCM, in Proc. 20th ACM Int. Conf. Information and Knowledge Management, Glasgow, UK, 2011, pp. 2401–2404.
T. Z. Wang and R. Johnson, Scalable logging through emerging non-volatile memory, Proceedings of the VLDB Endowment, vol. 7, no. 10, pp. 865–876, 2014.
J. Arulraj, J. Levandoski, U. F. Minhas, and P. A. Larson, BzTree: A high-performance latch-free range index for non-volatile memory, Proceedings of the VLDB Endowment, vol. 11, no. 5, pp. 553–565, 2018.
X. J. Zhou, L. D. Shou, K. Chen, W. Hu, and G. Chen, DPTree: Differential indexing for persistent memory, Proceedings of the VLDB Endowment, vol. 13, no. 4, pp. 421–434, 2019.
S. N. Ma, K. Chen, S. M. Chen, M. X. Liu, J. L. Zhu, H. B. Kang, and Y. W. Wu, ROART: Range-query optimized persistent ART, in 19th USENIX Conf. File and Storage Technologies, Santa Clara, CA, USA, 2021, pp. 1–16.
I. Oukid, D. Booss, W. Lehner, P. Bumbulis, and T. Willhalm, SOFORT: A hybrid SCM-DRAM storage engine for fast data recovery, in Proc. 10th Int. Workshop on Data Management on New Hardware, Snowbird, UT, USA, 2014, p. 8.
M. Liu, Concurrency control for non-volatile memory systems, (in Chinese), PhD dissertation, Department of Computer Science and Technology, Tsinghua University, Beijing, China, 2020.
[15]
H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O’Neil, and P. O’Neil, A critique of ANSI SQL isolation levels, ACM SIGMOD Record, vol. 24, no. 2, pp. 1–10, 1995.
M. J. Cahill, U. Röhm, and A. D. Fekete, Serializable isolation for snapshot databases, ACM Transactions on Database Systems, vol. 34, no. 4, p. 20, 2009.
T. Z. Wang, R. Johnson, A. Fekete, and I. Pandis, Efficiently making (almost) any concurrency control mechanism serializable, The VLDB Journal, vol. 26, no. 4, pp. 537–562, 2017.
J. Yang, J. Kim, M. Hoseinzadeh, J. Izraelevitz, and S. Swanson, An empirical guide to the behavior and use of scalable persistent memory, in 18th USENIX Conf. File and Storage Technologies, Santa Clara, CA, USA, 2020, pp. 169–182.
M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudre-Mauroux, and S. Madden, Hyrise: A main memory hybrid storage engine, Proceedings of the VLDB Endowment, vol. 4, no. 2, pp. 105–116, 2010.
C. Diaconu, C. Freedman, E. Ismert, P. A. Larson, P. Mittal, R. Stonecipher, N. Verma, and M. Zwilling, Hekaton: SQL server’s memory-optimized OLTP engine, in Proc. 2013 ACM SIGMOD Int. Conf. Management of Data, New York, NY, USA, 2013, pp. 1243–1254.
T. Neumann, T. Mühlbauer, and A. Kemper, Fast serializable multi-version concurrency control for main-memory database systems, in Proc. 2015 ACM SIGMOD Int. Conf. Management of Data, Melbourne, Australia, 2015, pp. 677–689.
J. Lee, M. Muehle, N. May, F. Faerber, V. Sikka, H. Plattner, J. Krueger, and M. Grund, High-performance transaction processing in SAP HANA, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 36, no. 2, pp. 28–33, 2013.
Y. J. Wu, J. Arulraj, J. X. Lin, R. Xian, and A. Pavlo, An empirical evaluation of in-memory multi-version concurrency control, Proceedings of the VLDB Endowment, vol. 10, no. 7, pp. 781–792, 2017.
C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz, ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging, ACM Transactions on Database Systems, vol. 17, no. 1, pp. 94–162, 1992.
H. Kimura, FOEDUS: OLTP engine for a thousand cores and NVRAM, in Proc. 2015 ACM SIGMOD Int. Conf. Management of Data, Melbourne, Australia, 2015, pp. 691–706.
G. Liu, L. Y. Chen, and S. M. Chen, Zen: A high-throughput log-free OLTP engine for non-volatile main memory, Proceedings of the VLDB Endowment, vol. 14, no. 5, pp. 835–848, 2021.
J. Izraelevitz, H. Mendes, and M. L. Scott, Linearizability of persistent memory objects under a full-system-crash failure model, in Int. Symp. Distributed Computing, Paris, France, 2016, pp. 313–327.
T. David, A. Dragojevi, R. Guerraoui, and I. Zablotchi, Log-free concurrent data structures, in Proc. 2018 USENIX Annu. Technical Conf., Boston, MA, USA, 2018, pp. 373–385.
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears, Benchmarking cloud serving systems with YCSB, in Proc. 1st ACM Symp. Cloud Computing, Indianapolis, IN, USA, 2010, pp. 143–154.
J. Y. Gu, Q. Q. Yu, X. Y. Wang, Z. G. Wang, B. Y. Zang, H. B. Guan, and H. B. Chen, Pisces: A scalable and efficient persistent transactional memory, in 2019 USENIX Annu. Technical Conf., Renton, WA, USA, 2019, pp. 913–928.
This work was supported by the National Key Research & Development Program of China (No. 2016YFB1000504) and the National Natural Science Foundation of China (Nos. 61877035, 61433008, 61373145, and 61572280).
Rights and permissions
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).