AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.5 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Extensible Bloom Filters: Adaptive Strategies for Scalability and Efficiency in Network and Distributed Systems to Handle Increased Data

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
College of Computer Science and Electronics Engineering, Hunan University, Changsha 410082, China, and also with the Ministry of Education Key Laboratory of Fusion Computing of Supercomputing and Artificial Intelligence, Changsha 410082, China

Show Author Information

Abstract

Bloom Filters (BFs) are compact and probabilistic data structures designed for efficient set membership queries. They offer high query and storage efficiency, making them particularly useful in network and distributed systems. However, the scalability of BFs in accommodating “big data” is limited by increased false positive rates, inflexible hash functions, and inefficient matching with dynamic datasets. To address these limitations, we introduce the Extensible Bloom Filter (EBF), which incorporates a flexible expansion mechanism and an adaptive hash function generation scheme. The EBF design features a set of BF vectors that expand according to the rate of incoming data, with each vector sized to suit the characteristics of the data. Adaptive hash functions, derived from common base matrices, streamline the process by leveraging strong inter-hash relationships. This reduces overhead and simplifies queries across multiple BF vector sizes. Performance evaluations have shown that the EBF consistently achieves a low false positive rate and minimal query time, even amid dynamic data arrivals and large data sets. With its extensibility and adaptability, the EBF provides a robust solution for applications requiring dynamic set representations with stringent accuracy requirements. It enhances the capabilities of network and distributed systems, making them more efficient in handling complex data scenarios.

References

[1]

B. H. Bloom, Space/time trade-offs in hash coding with allowable errors, Commun. ACM, vol. 13, no. 7, pp. 422–426, 1970.

[2]
S. Pei, K. Xie, X. Wang, G. Xie, K. Li, W. Li, Y. Li, and J. Wen, BhBF: A bloom filter using Bh sequences for multi-set membership query, ACM Trans. Knowl. Discov. Data, vol. 16, no. 5, p. 89, 2022.
[3]
R. Xie, M. Li, Z. Miao, R. Gu, H. Huang, H. Dai, and G. Chen, Hash adaptive bloom filter, in Proc. 2021 IEEE 37 th Int. Conf. Data Engineering (ICDE ), Chania, Greece, 2021, pp. 636–647.
[4]

K. Xie, S. Pei, X. Wang, W. Shi, G. Xie, K. Li, Y. Li, and J. Wen, A stateful bloom filter for per-flow state monitoring, IEEE Trans. Netw. Sci. Eng., vol. 8, no. 2, pp. 1399–1413, 2021.

[5]

G. Cheng, L. Luo, J. Xia, D. Guo, and Y. Sun, When deduplication meets migration: An efficient and adaptive strategy in distributed storage systems, IEEE Trans. Parallel Distrib. Syst., vol. 34, no. 10, pp. 2749–2766, 2023.

[6]

M. Mitzenmacher and A. Broder, Network applications of bloom filters: A survey, Internet Mathematics, vol. 1, no. 4, pp. 485–509, 2004.

[7]

S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz, Theory and practice of bloom filters for distributed systems, IEEE Commun. Surv. Tut., vol. 14, no. 1, pp. 131–155, 2012.

[8]

L. Fan, P. Cao, J. Almeida, and A. Z. Broder, Summary cache: A scalable wide-area web cache sharing protocol, IEEE/ACM Trans. Netw., vol. 8, no. 3, pp. 281–293, 2000.

[9]

H. Alexander, I. Khalil, C. Cameron, Z. Tari, and A. Zomaya, Cooperative web caching using dynamic interest-tagged filtered bloom filters, IEEE Trans. Parallel Distrib. Syst., vol. 26, no. 11, pp. 2956–2969, 2015.

[10]
L. Li, K. Xie, S. Pei, J. Wen, W. Liang, and G. Xie, CS-Sketch: Compressive sensing enhanced sketch for full traffic measurement, IEEE Trans. Netw. Sci. Eng., vol. 11, no. 3, pp. 2338–2352, 2024.
[11]

J. Xia, G. Cheng, L. Luo, D. Guo, P. Lv, and B. Sun, The doctrine of mean: Realizing deduplication storage at unreliable edge, IEEE Trans. Parallel Distrib. Syst., vol. 34, no. 10, pp. 2811–2826, 2023.

[12]

L. Luo, D. Guo, R. T. B. Ma, O. Rottenstreich, and X. Luo, Optimizing bloom filter: Challenges, solutions, and comparisons, IEEE Commun. Surv. Tut., vol. 21, no. 2, pp. 1912–1949, 2019.

[13]
S. Z. Kiss, É. Hosszu, J. Tapolcai, L. Rónyai, and O. Rottenstreich, Bloom filter with a false positive free zone, in Proc. IEEE INFOCOM 2018 - IEEE Conf. Computer Communications, Honolulu, HI, USA, 2018, pp. 1412–1420.
[14]
S. Xiong, Y. Yao, Q. Cao, and T. He, kBF: A bloom filter for key-value storage with an application on approximate state machines, in Proc. IEEE INFOCOM 2014 - IEEE Conf. Computer Communications, Toronto, Canada, 2014, pp. 1150–1158.
[15]

A. Bhattacharya, C. Gudesa, A. Bagchi, and S. Bedathur, New wine in an old bottle: Data-aware hash functions for bloom filters, Proc. VLDB Endow., vol. 15, no. 9, pp. 1924–1936, 2022.

[16]
F. Deng and D. Rafiei, Approximately detecting duplicates for streaming data using stable bloom filters, in Proc. 2006 ACM SIGMOD Int. Conf. Management of Data, Chicago, IL, USA, 2006, pp. 25–36.
[17]
D. Guo, J. Wu, H. Chen, and X. Luo, Theory and network applications of dynamic bloom filters, in Proc. 25 th IEEE Int. Conf. Computer Communications, Barcelona, Spain, 2006, pp. 1–12.
[18]

S. Lee, H. Byun, and H. Lim, Set reconciliation using ternary and invertible bloom filters, IEEE Trans. Knowl. Data Eng., vol. 35, no. 11, pp. 11885–11898, 2023.

[19]
M. Li, R. Xie, D. Chen, H. Dai, R. Gu, H. Huang, W. Dou, and G. Chen, A Pareto optimal Bloom filter family with hash adaptivity, VLDB J., vol. 32, no. 3, pp. 525–548, 2023.
[20]
J. Wei, H. Jiang, K. Zhou, and D. Feng, MAD2: A scalable high-throughput exact deduplication approach for network backup services, in Proc. 2010 IEEE 26 th Symp. Mass Storage Systems and Technologies (MSST ), Incline Village, NV, USA, 2010, pp. 1–14.
[21]

P. S. Almeida, C. Baquero, N. Preguiça, and D. Hutchison, Scalable bloom filters, Inf. Process. Lett., vol. 101, no. 6, pp. 255–261, 2007.

[22]
K. Xie, Y. Min, D. Zhang, J. Wen, and G. Xie, A scalable bloom filter for membership queries, in Proc. IEEE Global Telecommunications Conf., Washington, DC, USA, 2007, pp. 543–547.
[23]

M. Mitzenmacher, Compressed Bloom filters, IEEE/ACM Trans. Netw., vol. 10, no. 5, pp. 604–612, 2002.

[24]

A. Kumar, J. Xu, and J. Wang, Space-code bloom filter for efficient per-flow traffic measurement, IEEE J. Select. Areas Commun., vol. 24, no. 12, pp. 2327–2339, 2006.

[25]
S. Cohen and Y. Matias, Spectral bloom filters, in Proc. 2003 ACM SIGMOD Int. Conf. Management of Data, San Diego, CA, USA, 2003, pp. 241–252.
[26]
S. Dutta, S. Bhattacherjee, and A. Narang, Towards “intelligent compression” in streams: A biased reservoir sampling based bloom filter approach, in Proc. 15 th Int. Conf. Extending Database Technology, Berlin, Germany, 2012, pp. 228–238.
[27]

S. Dutta, A. Narang, and S. K. Bera, Streaming quotient filter: A near optimal approximate duplicate detection approach for data streams, Proc. VLDB Endow., vol. 6, no. 8, pp. 589–600, 2013.

[28]

M. Yoon, Aging bloom filter with two active buffers for dynamic sets, IEEE Trans. Knowl. Data Eng., vol. 22, no. 1, pp. 134–138, 2010.

[29]
E. Assaf, R. B. Basat, G. Einziger, and R. Friedman, Pay for a sliding bloom filter and get counting, distinct elements, and entropy for free, in Proc. IEEE INFOCOM 2018- IEEE Conf. Computer Communications, Honolulu, HI, USA, 2018, pp. 2204–2212.
[30]
M. K. Yoon, J. Son, and S. H. Shin, Bloom tree: A search tree based on bloom filters for multiple-set membership testing, in Proc. IEEE INFOCOM 2014 - IEEE Conf. Computer Communications, Toronto, ON, Canada, 2014, pp. 1429–1437.
[31]

Y. Wu, J. He, S. Yan, J. Wu, T. Yang, O. Ruas, G. Zhang, and B. Cui, Elastic bloom filter: Deletable and expandable filter using elastic fingerprints, IEEE Trans. Comput., vol. 71, no. 4, pp. 984–991, 2022.

[32]

N. Dayan, I. Bercea, P. Reviriego, and R. Pagh, InfiniFilter: Expanding filters to infinity and beyond, Proc. ACM Manag. Data, vol. 1, no. 2, pp. 140, 2023.

[33]

J. L. Carter and M. N. Wegman, Universal classes of hash functions, J. Comput. Syst. Sci., vol. 18, no. 2, pp. 143–154, 1979.

[34]

M. V. Ramakrishna, E. Fu, and E. Bahcekapili, Efficient hardware hashing functions for high performance computers, IEEE Trans. Comput., vol. 46, no. 12, pp. 1378–1381, 1997.

[35]
M. V. Ramakrishna and G. A. Portice, Perfect hashing functions for hardware applications, in Proc. 7 th Int. Conf. Data Engineering, Kobe, Japan, 1991, pp. 464–470.
[36]
MAWI working group traffic archive, http://mawi.nezu.wide.ad.jp/mawi/, 2024.
[37]
CAIDA, http://www.caida.org/data/, 2024.
[38]
UMass, https://traces.cs.umass.edu/docs/traces/network/,2024.
Tsinghua Science and Technology
Pages 1846-1864
Cite this article:
Wen J, Pei S, Yan C, et al. Extensible Bloom Filters: Adaptive Strategies for Scalability and Efficiency in Network and Distributed Systems to Handle Increased Data. Tsinghua Science and Technology, 2025, 30(4): 1846-1864. https://doi.org/10.26599/TST.2024.9010160

57

Views

2

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 15 March 2024
Revised: 23 July 2024
Accepted: 29 August 2024
Published: 11 September 2024
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return