Journal Home > Volume 21 , Issue 2

Bitmap indexing has been widely used in various applications due to its speed in bitwise operations. However, it can consume large amounts of memory. To solve this problem, various bitmap coding algorithms have been proposed. In this paper, we present COMbining Binary And Ternary encoding (COMBAT), a new bitmap index coding algorithm. Typical algorithms derived from Word Aligned Hybrid (WAH) are COMPressed Adaptive indeX (COMPAX) and Compressed “n” Composable Integer Set (CONCISE), which can combine either two or three continuous words after WAH encoding. COMBAT combines both mechanisms and results in more compact bitmap indexes. Moreover, querying time of COMBAT can be faster than that of COMPAX and CONCISE, since bitmap indexes are smaller and it would take less time to load them into memory. To prove the advantages of COMBAT, we extend a theoretical analysis model proposed by our group, which is composed of the analysis of various possible bitmap indexes. Some experimental results based on real data are also provided, which show COMBAT’s storage and speed superiority. Our results demonstrate the advantages of COMBAT and codeword statistics are provided to solidify the proof.


menu
Abstract
Full text
Outline
About this article

COMBAT: A New Bitmap Index Coding Algorithm for Big Data

Show Author's information Yinjun WuZhen Chen( )Yuhao WenWenxun ZhengJunwei Cao
Department of Automation and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China.
iCenter of Tsinghua University, Beijing 100084, China.
Department of Computer Science, Duke University, NC 27708, USA.
Research Institute of Information Technology and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China.

Abstract

Bitmap indexing has been widely used in various applications due to its speed in bitwise operations. However, it can consume large amounts of memory. To solve this problem, various bitmap coding algorithms have been proposed. In this paper, we present COMbining Binary And Ternary encoding (COMBAT), a new bitmap index coding algorithm. Typical algorithms derived from Word Aligned Hybrid (WAH) are COMPressed Adaptive indeX (COMPAX) and Compressed “n” Composable Integer Set (CONCISE), which can combine either two or three continuous words after WAH encoding. COMBAT combines both mechanisms and results in more compact bitmap indexes. Moreover, querying time of COMBAT can be faster than that of COMPAX and CONCISE, since bitmap indexes are smaller and it would take less time to load them into memory. To prove the advantages of COMBAT, we extend a theoretical analysis model proposed by our group, which is composed of the analysis of various possible bitmap indexes. Some experimental results based on real data are also provided, which show COMBAT’s storage and speed superiority. Our results demonstrate the advantages of COMBAT and codeword statistics are provided to solidify the proof.

Keywords: big data, performance evaluation, bitmap index, COMBAT, CONCISE, COMPAX, index encoding

References(30)

[1]
Huang W., Chen Z., Dong W., Li H., Cao B., and Cao J., Mobile internet big data platform in China Unicom, Tsinghua Science and Technology, vol. 19, no. 1, pp. 95–101, 2014.
[2]
Barroso L. A., Clidaras J., and Hölzle U., The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan and Claypool Publishers, 2013.
[3]
Chen Z., Huang W., and Cao J., Big Data Engineering for Internet Traffic. Beijing, China: Tsinghua University Press, 2014.
[4]
Cheng P., Bitmap index techniques and its research advancement, Science and Technologies Information, vol. 026, pp. 134–135, 2010.
[5]
Li J., Research in bitmap index in data warehouse, (in Chinese), PhD dissertation, Shandong University, China, 2007.
[6]
Huang Z., Lv W., and Huang J., Improved BLAST algorithm based on bitmap indexes and B+ tree, Computer Engineering and Applications, vol. 49, no. 11, pp. 118–120, 2013.
[7]
Yang B., Qi Y., Xue Y., and Li J., Bitmap data structure: Towards high-performance network algorithms designing, (in Chinese), Computer Engineering and Applications, vol. 45, no. 15, pp. 1–5, 2009.
[8]
Garcia-Molina H., Ullman J. D., and Widom J., Database System Implementation, Second Edition. Prentice Hall, 2009.
[9]
Chan C., Bitmap index, in Encyclopedia of Database Systems. Springer, 2009, pp. 244-248.
DOI
[10]
Wu M. and Buchmann A. P., Encoded bitmap indexing for data warehouses, in Proc. 14th Intl. Conf. Data Eng. (ICDE), 1998, pp. 220-230.
[11]
Corrales F., Chiu D., and Sawin J., Variable Length Compression for Bitmap Indexes, in DEXA11. Springer- Verlag, 2011, pp. 381–395.
DOI
[12]
Colantonio A. and Di Pietro R., Concise: Compressed n composable integer set, Information Processing Letters, vol. 110, no. 16, pp. 644–650, 2010.
[13]
Antoshenkov G., Byte-aligned bitmap compression, in Data Compression Conference, 1995. DCC’95. Proceedings, 1995.
[14]
Wu K., Otoo E. J., and Shoshani A., Compressing bitmap indexes for faster search operations, in Scientific and Statistical Database Management, 2002. Proceedings 14th International Conference on, 2002, pp. 99-108.
[15]
Wu K., Otoo E. J., and Shoshani A., Optimizing bitmap indexes with efficient compression, ACM Transactions on Database Systems (TODS), vol. 31, no. 1, pp. 1–38, 2006.
[16]
Guadalupe C., Gibas M., and Ferhatosmanoglu H., Update conscious bitmap indexes, in 19th IEEE International Conference on Scientific and Statistical Database Management SSBDM07, 2007, p. 15.
[17]
Stabno M., and Wrembel R., RLH: Bitmap compression technique based on run-length and Huffman encoding, Information Systems, vol. 34, no. 4, pp.400–414, 2009.
[18]
Deli‘ege F. and Pedersen T. B., Position list word aligned hybrid: Optimizing space and performance for compressed bitmaps, in Proceeding of the 13th International Conference on Extending Database Technology, 2010.
[19]
Lemire D., Kaser O., and Aouiche K., Sorting improves word-aligned bitmap indexes, Data & Knowledge Engineering, vol. 69, no. 1, pp. 3–28, 2010.
[20]
van Schaik S. J. and de Moor O., A memory efficient reachability data structure through bit vector compression, in Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 2011, pp. 913-924.
[21]
Fusco F., Stoecklin M. P., and Vlachos M., Net-fli: On-the-fly compression, archiving and indexing of streaming network traffic, Proceedings of the VLDB Endowment, vol. 3, nos. 1–2, pp. 1382–1393, 2010.
[22]
Andrzejewski W., and Wrembel R., GPU-WAH: Applying GPUs to compressing bitmap indexes with word aligned hybrid, in Database and Expert Systems Applications. Springer Berlin Heidelberg, 2010, pp. 315-329.
DOI
[23]
Fusco F., Vlachos M., Dimitropoulos X., and Deri L., Indexing million of packets per second using GPUs, in Proceedings of the 2013 Conference on Internet Measurement, 2013, pp. 327-332.
[24]
Andrzejewski W., and Wrembel R., GPU-PLWAH: GPU based implementation of the PLWAH algorithm for compressing bitmaps, Control & Cybernetics, vol. 40, no. 3, pp. 627–650, 2011.
[25]
Wen Y., Chen Z., Ma G., Cao J., Zheng W., Peng G., and Huang W. L., SECOMPAX: A bitmap index compression algorithm, in 23rd International Conference on Computer Communication and Networks (ICCCN), IEEE, 2014, pp. 1-7.
[26]
Chang J., Chen Z., Zheng W., Wen Y., Cao J., and Huang W. L., PLWAH+: A bitmap index compressing scheme based on PLWAH, in Proceedings of the Tenth ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ACM, 2014, pp. 257-258.
[27]
Schmidt A., Kimmig D., and Beine M., DFWAH: A proposal of a new compression scheme of medium-sparse bitmaps, in the Third International Conference on Advances in Databases, Knowledge, and Data Applications (DBKDA 2011), 2011, pp. 192-195.
[28]
Chambi S., Lemire D., Kaser O., and Godin R., Better bitmap performance with Roaring bitmaps, arXiv preprint arXiv:1402.6407, 2014.
DOI
[29]
Chen Z., Wen Y., Cao J., Zheng W., Chang J., Wu Y., Ma G., Hakmaoui M., and Peng G., A survey of bitmap index compression algorithms for big data, Tsinghua Science and Technology, vol. 20, no. 1, pp. 100–115, 2015.
[30]
Wu Y., Chen Z., Wen Y., Cao J., Zheng W., Ma G., A general analytical model for spatial and temporal performance of bitmap index compression algorithms in Big Data, in Computer Communication and Networks (ICCCN), 2015 24th International Conference on. IEEE, 2015, pp. 1-10.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 08 January 2016
Accepted: 27 January 2016
Published: 31 March 2016
Issue date: April 2016

Copyright

© The author(s) 2016

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Nos. 61472200 and 61233016), the National Key Basic Research and Development (973) Program of China (No. 2013CB228206), the State Grid R&D Project Architecture of Information Communication System for Energy Internet (No. SGRIXTKJ[2015]253), and the National Training Program of Innovation and Entrepreneurship for Undergraduates (Nos. 201510003049 and 201510003B066).

Rights and permissions

Return