Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Approximate duplicate detection in data streams aims to determine whether an item is present within a small subsequence of the data stream. It is a fundamental query problem needed in several network applications, such as web crawling and Radio Frequency Identification (RFID) tag management. Most of the existing algorithms are not space-efficient as they overlook the distributional information of query frequency and membership likelihood. In this paper, we propose CEll Bloom Filter (CEBF) algorithm, a space-efficient data structure designed by adopting a block-wise updating strategy, to solve this problem, and two typical distributions are considered: two typical distributions: (1) uniform query frequency of items, and (2) uniform membership likelihood of items. For an arbitrary sliding window size n and an arbitrary average false positive rate
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
Comments on this article