AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.1 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

A Survey of Bitmap Index Compression Algorithms for Big Data

Zhen Chen( )Yuhao WenJunwei CaoWenxun ZhengJiahui ChangYinjun WuGe MaMourad HakmaouiGuodong Peng
Research Institute of Information Technology, Tsinghua University, Beijing 100084, China.
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China.
Department of Physics, Tsinghua University, Beijing 100084, China.
Department of Aerospace Engineering, Tsinghua University, Beijing 100084, China.
Department of Automation, Tsinghua University, Beijing 100084, China.
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.
Department of Mechanical Engineering, Tsinghua University, Beijing 100084, China.
Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China.
Show Author Information

Abstract

With the growing popularity of Internet applications and the widespread use of mobile Internet, Internet traffic has maintained rapid growth over the past two decades. Internet Traffic Archival Systems (ITAS) for packets or flow records have become more and more widely used in network monitoring, network troubleshooting, and user behavior and experience analysis. Among the three key technologies in ITAS, we focus on bitmap index compression algorithm and give a detailed survey in this paper. The current state-of-the-art bitmap index encoding schemes include: BBC, WAH, PLWAH, EWAH, PWAH, CONCISE, COMPAX, VLC, DF-WAH, and VAL-WAH. Based on differences in segmentation, chunking, merge compress, and Near Identical (NI) features, we provide a thorough categorization of the state-of-the-art bitmap index compression algorithms. We also propose some new bitmap index encoding algorithms, such as SECOMPAX, ICX, MASC, and PLWAH+, and present the state diagrams for their encoding algorithms. We then evaluate their CPU and GPU implementations with a real Internet trace from CAIDA. Finally, we summarize and discuss the future direction of bitmap index compression algorithms. Beyond the application in network security and network forensic, bitmap index compression with faster bitwise-logical operations and reduced search space is widely used in analysis in genome data, geographical information system, graph databases, image retrieval, Internet of things, etc. It is expected that bitmap index compression will thrive and be prosperous again in Big Data era since 1980s.

References

[1]
Google, Web History, http://www.google.com/psearch, 2008.
[4]
Huang W., Chen Z., Dong W., Li H., Cao B., and Cao J., Mobile internet big data platform in China Unicom, Tsinghua Science and Technology, vol. 19, no. 1, pp. 95-101, 2014.
[5]
Chen Z., Huang W., and Cao J., Big Data Engineering for Internet Traffic, (in Chinese). Beijing, China: Tsinghua University Press, 2014.
[6]
Lin R., Wu B., Yang F., Zhao Y., and Hou J., An efficient adaptive failure detection mechanism for cloud platform based on volterra series, Communications, China, vol. 11, no. 4, pp. 1-12, 2014.
[7]
Meng Y., Luan Z., and Qian D., Differentiating data collection for cloud environment monitoring, Communications, China, vol. 11, no. 4, pp. 13-24, 2014.
[8]
Pu L., Xu J., Yu B., and Zhang J., Smart cafe: A mobile local computing system based on indoor virtual cloud, Communications, China, vol. 11, no. 4, pp. 38-49, 2014.
[9]
Li M., Lukyanenko A., Tarkoma S., and Yla-Jaaski A., MPTCP incast in data center networks, Communications, China, vol. 11, no. 4, pp. 25-37, 2014.
[10]
Rizzo L., Device polling support for FreeBSD, http://info.iet.unipi.it/luigi/polling/, 2002.
[11]
Deri L., Improving passive packet capture: Beyond device polling, in Proceedings of SANE, 2004, pp. 85-93.
[12]
Rizzo L., Netmap: A novel framework for fast packet I/O, in Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX Association, 2012, p. 9.
[13]
Papadogiannakis A., Polychronakis M., and Markatos E. P., Scap: Stream-oriented network traffic capture and analysis for high-speed networks, in Proceedings of the 2013 Conference on Internet Measurement Conference, 2013, pp. 441-454.
[14]
Ma G., Guo Z., Li X., Chen Z., Cao J., Jiang Y., and Guo X., BreadZip: A combination of network traffic data and bitmap index encoding algorithm, in Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on IEEE, 2014, pp. 3235-3240.
[15]
Barroso L. A., Clidaras J., and Hlzle U., The datacenter as a computer: An introduction to the design of warehouse-scale machines, Synthesis Lectures on Computer Architecture, 2013, .
[16]
Dean J., Challenges in building large-scale information retrieval systems, in Proceedings of the Second International Conference on Web Search and Web Data Mining, WSDM 2009, Barcelona, Spain, 2009.
[17]
Chandrasekaran S., Cooper O., Deshpande A., Franklin M. J., Hellerstein J. M., Hong W., and Shah M. A., TelegraphCQ: Continuous dataflow processing, in Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data ACM, 2003, p. 668.
[18]
Cranor C., Johnson T., Spataschek O., and Shkapenyuk V., Gigascope: A stream database for network applications, in Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data ACM, 2003, pp. 647-651.
[19]
Li X., Bian F., Zhang H., Diot C., Govindan R., and Hong W. H., Advanced indexing techniques for widearea network monitoring, in Data Engineering Workshops, 2005. 21st International Conference on IEEE, 2005, pp. 1184-1184.
[20]
Desnoyers P. and Shenoy P. J., Hyperion: High volume stream archival for retrospective querying, in USENIX Annual Technical Conference, 2007, pp. 45-58.
[21]
Wu K., FastBit: An efficient indexing technology for accelerating data-intensive science, Journal of Physics: Conference Series, vol. 16. no. 1, pp. 556-560, 2005.
[22]
Maier G., Sommer R., Dreger H., Feldmann A., Paxson V., and Schneider F., Enriching network security analysis with time travel, ACM SIGCOMM Computer Communication Review, vol. 38, no. 4, pp. 183-194, 2008.
[23]
Giura P. and Memon N., NetStore: An efficient storage infrastructure for network forensics and monitoring, in Recent Advances in Intrusion Detection. Springer Berlin, 2010, pp. 277-296.
[24]
Fusco F., Vlachos M., and Dimitropoulos X., RasterZip: Compressing network monitoring data with support for partial decompression, in Proceedings of the 2012 ACM Conference on Internet Measurement Conference, 2012, pp. 51-64.
[25]
Fusco F., Dimitropoulos X., Vlachos M., and Deri L., PcapIndex: An index for network packet traces with legacy compatibility, ACM SIGCOMM Computer Communication Review, vol. 42, no. 1, pp. 47-53, 2012.
[26]
Li J., Ding S., Xu M., Han F. Y., Guan X., and Chen Z., TIFA: Enabling real-time querying and storage of massive stream data, in Networking and Distributed Computing (ICNDC), 2011 Second International Conference on, 2011, pp. 61-64.
[27]
Chen Z., Ruan L., Cao J., Yu Y., and Jiang X., TIFAflow: Enhancing traffic archiving system with flow granularity for forensic analysis in network security, Tsinghua Science and Technology, vol. 18, no. 4, pp. 406-417, 2013.
[28]
Chen Z., Shi X., Ruan L., Xie F., and Li J., High speed traffic archiving system for flow granularity storage and querying, presented at ICCCN 2012 Workshop on PMECT, 2012.
[29]
Spiegler I. and Maayan R., Storage and retrieval considerations of binary data bases, Information Processing and Management, vol. 21, no. 3, pp. 233-254, 1985.
[30]
Cheng P., Bitmap index techniques and its research advancement, Science and Technologies Information, vol. 27, no. 26, pp. 134-135, 2010.
[31]
Huang Z., Lv W., and Huang J., Improved BLAST algorithm based on bitmap indexes and B+ tree, Computer Engineering and Applications, vol. 49, no. 11, pp. 118-120, 2013.
[32]
Yang B., Qi Y., Xue Y., and Li J., Bitmap data structure: Towards high-performance network algorithms designing, Computer Engineering and Applications, vol. 45, no. 15, pp. 1-5, 2009.
[33]
Garcia-Molina H., Ullman J. D., and Widom J., Database System Implementation, Second Edition. Prentice Hall, 2009.
[34]
O'Neil P. E., Model 204 architecture and performance, in High Performance Transaction Systems. Springer Berlin Heidelberg, 1989, pp. 39-59.
[35]
O'Neil P. and Quass D., Improved query performance with variant indexes, ACM Sigmod Record, vol. 26, no. 2, pp. 38-49, 1997.
[36]
Antoshekov G., Byte-aligned bitmap compression, in Proceedings of the Data Compression Conference (DCC'95), 1995, p. 476.
[37]
Wu K., Otoo E. J., and Shoshani Arie, Compressing bitmap indexes for faster search operations, in Scientific and Statistical Database Management, 2002. Proceedings. 14th International Conference on, 2002, pp. 99-108.
[38]
Wu K., Otoo E. J., and Shoshani A., Optimizing bitmap indices with efficient compression, ACM Transactions on Database Systems (TODS), vol. 31, no. 1, pp. 1-38, 2006.
[39]
Deli'ege F. and Pedersen T. B., Position list word aligned hybrid: Optimizing space and performance for compressed bitmaps, in Proceeding of the 13th International Conference on Extending Database Technology, 2010.
[40]
Lemire D., Kaser O., and Aouiche K., Sorting improves word-aligned bitmap indexes, Data & Knowledge Engineering, vol. 69, no. 1, pp. 3-28, 2010.
[41]
Colantonio A. and Pietro R. Di, Concise: Compressed ncomposable integer set, Information Processing Letters, vol. 110, no. 16, pp. 644-650, 2010.
[42]
Schaik S. J. van and Moor O. de, A memory efficient reachability data structure through bit vector compression, in Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 2011, pp. 913-924.
[43]
Fusco F., Stoecklin M. P., and Vlachos M., Net-fli: Onthe-fly compression, archiving and indexing of streaming network traffic, Proceedings of the VLDB Endowment, vol. 3, nos. 1&2, pp. 1382-1393, 2010.
[44]
Canahuate G., Gibas M., and Ferhatosmanoglu H., Update conscious bitmap indices, in 19th IEEE International Conference on Scientific and Statistical Database Management SSBDM'07, 2007.
[45]
Corrales F., Chiu D., and Sawin J., Variable length compression for bitmap indices, in Database and Expert Systems Applications, Hameurlain A., Liddle S. W., Schewe K.-D., and Zhou X., eds. Springer Berlin Heidelberg, 2011.
[46]
Guzun G., Canahuate G., Chiu D., and Sawin J., A tunable compression framework for bitmap indices, in IEEE International Conference on Data Engineering (ICDE 2014), 2014, pp. 484-495.
[47]
Chambi S., Lemire D., Kaser O., and Godin R., Better bitmap performance with Roaring bitmaps, arXiv preprint arXiv:1402.6407, 2014.
[48]
Stabno M. and Wrembel R., RLH: Bitmap compression technique based on run-length and Huffman encoding, Information Systems, vol. 34, no. 4, pp. 400-414, 2009.
[49]
Schmidt A., Kimmig D., and Beine M., A proposal of a new compression scheme of medium-sparse bitmaps, International Journal on Advances in Software, vol. 4, nos. 3&4, pp. 401-411, 2011.
[50]
Chang J., Chen Z., Zheng W., Wen Y., Cao J., and Huang W. L., PLWAH+: A bitmap index compressing scheme based on PLWAH, in Proceedings of the Tenth ACM/IEEE Symposium on Architectures for Networking and Communications Systems, 2014, pp. 257-258.
[51]
Nuutila E., An efficient transitive closure algorithm for cyclic digraphs, Information Processing Letters, vol. 52, pp. 207-213, 1994.
[52]
Wen Y., Chen Z., Ma G., Cao J., Zheng W., Peng G., and Huang W. L., SECOMPAX: A bitmap index compression algorithm, in Computer Communication and Networks (ICCCN), 2014 23rd International Conference on, IEEE, 2014, pp. 1-7.
[53]
Oberhumer M. F., The Lempel-Ziv-Oberhumer Packer, http://www.lzop.org/, 2010.
[54]
Reiss F., Stockinger K., Wu K., Shoshani A., and Hellerstein J. M., Enabling real-time querying of live and historical stream data, in Scientific and Statistical Database Management, 2007. SSBDM'07. 19th International Conference on, IEEE, 2007, p. 28.
[55]
Andrzejewski W. and Wrembel R., GPU-WAH: Applying GPUs to compressing bitmap indexes with word aligned hybrid, in Database and Expert Systems Applications, Bringas P. G., Hameurlain A., and Quirchmayr G., eds. Springer Berlin Heidelberg, 2010, pp. 315–329.
[56]
Andrzejewski W. and Wrembel R., GPU-PLWAH: GPU-based implementation of the PLWAH algorithm for compressing bitmaps, Control & Cybernetics, vol. 40, no. 3, pp. 627-650, 2011.
[57]
Fusco F., Vlachos M., Dimitropoulos X., and Deri L., Indexing million of packets per second using GPUs, in Proceedings of the 2013 Conference on Internet Measurement Conference, 2013, pp. 327-332.
[58]
Lakshminarasimhan S., Boyuka D. A., Pendse S. V., Zou X., Jenkins J., Vishwanath V., and Samatova N. F., Scalable in situ scientific data encoding for analytical query processing, in Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013, pp. 1-12.
[59]
Sinha R. R., Mitra S., and Winslett M., Bitmap indexes for large scientific data sets: A case study, in Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International, 2006, p. 10.
[60]
Sinha R. R., and Winslett M., Multi-resolution bitmap indexes for scientific data, ACM Transactions on Database Systems (TODS), vol. 32, no. 3, article no. 16, 2007.
[61]
Zhang Y., Abu-Khzam F. N., Baldwin N. E., Chesler E. J., Langston M. A., and Samatova N. F., Genome-scale computational approaches to memory-intensive applications in systems biology, in Proceedings of the ACM/IEEE SC 2005 Conference on Supercomputing'2005, 2005, p. 12.
[62]
Romosan A., Shoshani A., Wu K., Markowitz V., and Mavrommatis K., Accelerating gene context analysis using bitmaps, in Proceedings of the 25th International Conference on Scientific and Statistical Database Management, 2013, p. 26.
[63]
Hu Y., Sundara S., Chorma T., and Srinivasan J., Supporting RFID-based item tracking applications in oracle DBMS using a bitmap datatype, in Proceedings of the 31st International Conference on Very Large Data Bases, 2005, pp. 1140-1151.
[64]
Lee K. H. and Moon B., Bitmap indexes for relational XML twig query processing, in Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009, pp. 465-474.
[65]
Siqueira T. L. Lopes, Ciferri R. R., Times V. C., and Ciferri C. D. de Aguiar, A spatial bitmap-based index for geographical data warehouses, in Proceedings of the 2009 ACM Symposium on Applied Computing, 2009, pp. 1336-1342.
[66]
Zhang J. and You S., Dynamic tiled map services: Supporting query-based visualization of large-scale raster geospatial data, in Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, 2010, p. 19.
[67]
Martínez-Bazan N., Águila-Lorente M. Á., Muntés-Mulero V., Dominguez-Sal D., Gómez-Villamor S., and Larriba-Pey J. L., Efficient graph management based on bitmap indices, in Proceedings of the 16th International Database Engineering & Applications Sysmposium, 2012, pp. 110-119.
[68]
Cha G. H., Bitmap indexing method for complex similarity queries with relevance feedback, in Proceedings of the 1st ACM International Workshop on Multimedia Databases, 2003, pp. 55-62.
[69]
Fontoura M., Gurevich M., Josifovski V., and Vassilvitskii S., Efficiently encoding term co-occurrences in inverted indexes, in Proceedings of the 20th ACM International Conference on Information and Knowledge Management, 2011, pp. 307-316.
[70]
Dash D., Rao J., Megiddo N., Ailamaki A., and Lohman G., Dynamic faceted search for discovery-driven analysis, in Proceedings of the 17th ACM Conference on Information and Knowledge Management, 2008, pp. 3-12.
Tsinghua Science and Technology
Pages 100-115
Cite this article:
Chen Z, Wen Y, Cao J, et al. A Survey of Bitmap Index Compression Algorithms for Big Data. Tsinghua Science and Technology, 2015, 20(1): 100-115. https://doi.org/10.1109/TST.2015.7040519

532

Views

16

Downloads

33

Crossref

N/A

Web of Science

37

Scopus

3

CSCD

Altmetrics

Received: 05 December 2014
Accepted: 28 December 2014
Published: 12 February 2015
© The authors 2015
Return