Journal Home > Volume 27 , Issue 2

Spatial databases store objects with their locations and certain types of attached items. A variety of modern applications have been developed by leveraging the utilization of locations and items in spatial objects, such as searching points of interest, hot topics, or users’ attitude in specified spatial regions. In many scenarios, the high and low-frequency items in a spatial region are worth noticing, considering they represent the majority’s interest or eccentric users’ opinion. However, existing works have yet to identify such items in an interactive manner, despite the significance of the endeavor in decision-making systems. This study recognizes a novel type of analytical query, called top/bottom- k fraction query, to discover such items in spatial databases. To achieve fast query response, we propose a multilayered data summary that is spread out across the main memory and external memory. A memory-based estimation method for top/bottom- k fraction queries is proposed. To maximize the use of the main memory space, we design a data summary tuning method to dynamically allocate memory space among different spatial partitions. The proposed approach is evaluated with real-life datasets and synthetic datasets in terms of estimation accuracy. Evaluation results demonstrate the effectiveness of the proposed data summary and corresponding estimation and tuning algorithms.


menu
Abstract
Full text
Outline
About this article

Efficient Top/Bottom-k Fraction Estimation in Spatial Databases Using Bounded Main Memory

Show Author's information Jinbao WangZhuojun DuanXixian Han( )Donghua Yang( )
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Computer Science Department, James Madison University, Harrisonburg, VA 22807, USA

Abstract

Spatial databases store objects with their locations and certain types of attached items. A variety of modern applications have been developed by leveraging the utilization of locations and items in spatial objects, such as searching points of interest, hot topics, or users’ attitude in specified spatial regions. In many scenarios, the high and low-frequency items in a spatial region are worth noticing, considering they represent the majority’s interest or eccentric users’ opinion. However, existing works have yet to identify such items in an interactive manner, despite the significance of the endeavor in decision-making systems. This study recognizes a novel type of analytical query, called top/bottom- k fraction query, to discover such items in spatial databases. To achieve fast query response, we propose a multilayered data summary that is spread out across the main memory and external memory. A memory-based estimation method for top/bottom- k fraction queries is proposed. To maximize the use of the main memory space, we design a data summary tuning method to dynamically allocate memory space among different spatial partitions. The proposed approach is evaluated with real-life datasets and synthetic datasets in terms of estimation accuracy. Evaluation results demonstrate the effectiveness of the proposed data summary and corresponding estimation and tuning algorithms.

Keywords: exploratory analytic, top-k items, bottom-k items, spatial database

References(37)

[1]
Y. Y. Chen, T. Suel, and A. Markowetz, Efficient query processing in geographic web search engines, in Proc. ACM SIGMOD Int. Conf. Management of Data, Chicago, IL, USA, 2006, pp. 277-288.
DOI
[2]
Z. P. Cai and Z. B. He, Trading private range counting over big IoT data, in Proc. of 2019 IEEE 39th Int. Conf. Distributed Computing Systems (ICDCS), Dallas, TX, USA, 2019, pp. 144-153.
DOI
[3]
Z. P. Cai and Q. Chen, Latency-and-coverage aware data aggregation scheduling for multihop battery-free wireless networks, IEEE Transactions on Wireless Communications, vol. 20, no. 3, pp. 1770-1784, 2021.
[4]
P. Ahmed, M. Hasan, A. Kashyap, V. Hristidis, and V. J. Tsotras, Efficient computation of top-k frequent terms over spatio-temporal ranges, in Proc. 2017 ACM Int. Conf. Management of Data, Chicago, IL, USA, 2017, pp. 1227-1241.
DOI
[5]
N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger, The R*-tree: An efficient and robust access method for points and rectangles, in Proc. 1990 ACM SIGMOD Int. Conf. Management of Data, New York, NY, USA, 1990, pp. 322-331.
DOI
[6]
Z. P. Cai, Z. B. He, X. Guan, and Y. S. Li, Collective data-sanitization for preventing sensitive information inference attacks in social networks, IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 4, pp. 577-590, 2018.
[7]
X. Zheng, Z. P. Cai, J. G. Yu, C. K. Wang, and Y. S. Li, Follow but no track: Privacy preserved profile publishing in cyber-physical social systems, IEEE Internet of Things Journal, vol. 4, no. 6, pp. 1868-1878, 2017.
[8]
S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, and I. Stoica, BlinkDB: Queries with bounded errors and bounded response times on very large data, in Proc. 8th ACM European Conf. Computer Systems, Prague, Czech Republic, 2013, pp. 29-42.
DOI
[9]
R. A. Finkel and J. L. Bentley, Quad trees a data structure for retrieval on composite keys, Acta Informatica, vol. 4, no. 1, pp. 1-9, 1974.
[10]
Open street map, http://www.openstreetmap.org/, 2004.
[11]
S. Farazi and D. Rafiei, Top-K frequent term queries on streaming data, in Proc. of 2019 IEEE 35th Int. Conf. Data Engineering (ICDE), Macao, China, 2019, pp. 1582-1585.
DOI
[12]
D. X. Zhang, B. C. Ooi, and A. K. H. Tung, Locating mapped resources in Web 2.0, in Proc. of 2010 IEEE 26th Int. Conf. Data Engineering (ICDE 2010), Long Beach, CA, USA, 2010, pp. 521-532.
DOI
[13]
Y. H. Zhou, X. Xie, C. Wang, Y. C. Gong, and W. Y. Ma, Hybrid index structures for location-based web search, in Proc. 14th ACM Int. Conf. Information and Knowledge Management, Bremen, Germany, 2005, pp. 155-162.
DOI
[14]
J. Wang, D. Yang, Y. Wei, H. Gao, J. Li, and Y. Yuan, Answering spatial approximate keyword queries in disks, in Proc. of APWeb, Guangzhou, China, 2015, pp. 424-436
DOI
[15]
A. Guttman, R-trees: A dynamic index structure for spatial searching, ACM SIGMOD Record, vol. 14, no. 2, pp. 47-57, 1984.
[16]
J. M. Hellerstein, P. J. Haas, and H. J. Wang, Online aggregation, ACM SIGMOD Record, vol. 26, no. 2, pp. 171-182, 1997.
[17]
Y. Park, B. Mozafari, J. Sorenson, and J. H. Wang, VerdictDB: Universalizing approximate query processing, in Proc. 2018 Int. Conf. Management of Data, Houston, TX, USA, 2018, pp. 1461-1476.
DOI
[18]
D. Ting, Approximate distinct counts for billions of datasets, in Proc. 2019 Int. Conf. Management of Data, Amsterdam, the Netherlands, 2019, pp. 69-86.
DOI
[19]
Y. Chronis, T. Do, G. Graefe, and K. Peters, External merge sort for Top-K queries: Eager input filtering guided by histograms, in Proc. 2020 ACM SIGMOD Int. Conf. Management of Data, Portland, OR, USA, 2020, pp. 2423-2437.
DOI
[20]
S. Y. Cheng, Z. P. Cai, and J. Z. Li, Curve query processing in wireless sensor networks, IEEE Transactions on Vehicular Technology, vol. 64, no. 11, pp. 5198-5209, 2015.
[21]
F. F. Li, B. Yao, M. W. Tang, and M. Hadjieleftheriou, Spatial approximate string search, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 6, pp. 1394-1409, 2013.
[22]
J. B. Wang, Z. P. Cai, and J. G. Yu, Achieving personalized k-anonymity-based content privacy for autonomous vehicles in CPS, IEEE Transactions on Industrial Informatics, vol. 16, no. 6, pp. 4242-4251, 2020.
[23]
Z. P. Cai and X. Zheng, A private and efficient mechanism for data uploading in smart cyber-physical systems, IEEE Transactions on Network Science and Engineering, vol. 7, no. 2, pp. 766-775, 2020.
[24]
J. L. Bentley, Multidimensional binary search trees used for associative searching, Commun. ACM, vol. 18, no. 9, pp. 509-517, 1975.
[25]
S. Alsubaiee, A. Behm, and C. Li, Supporting location-based approximate-keyword queries, in Proc. 18th SIGSPATIAL Int. Conf. Advances in Geographic Information Systems, San Jose, CA, USA, 2010. pp. 61-70.
DOI
[26]
Z. B. He, Z. P. Cai, S. Y. Cheng, and X. M. Wang, Approximate aggregation for tracking quantiles and range countings in wireless sensor networks, Theoretical Computer Science, vol. 607, pp. 381-390, 2015.
[27]
J. Li, A. M. V. V. Sai, X. Z. Cheng, W. Cheng, Z. Tian, and Y. S. Li, Sampling-based approximate skyline query in sensor equipped IoT networks, Tsinghua Science and Technology, vol. 26, no. 2, pp. 219-229, 2021.
[28]
J. Chen, X. C. Wang, and C. K. Wang, Understanding item consumption orders for right-order next-item recommendation, Knowledge and Information Systems, vol. 57, no. 1, pp. 55-78, 2018.
[29]
J. Chen, C. K. Wang, J. M. Wang, and P. S. Yu, Recommendation for repeat consumption from user implicit feedback, IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 11, pp. 3083-3097, 2016.
[30]
X. Zheng and Z. P. Cai, Privacy-preserved data sharing towards multiple parties in industrial IoTs, IEEE Journal on Selected Areas in Communications, vol. 38, no. 5, pp. 968-979, 2020.
[31]
Z. P. Cai and T. Shi, Distributed query processing in the edge assisted IoT data monitoring system, IEEE Internet of Things Journal, .
[32]
S. Y. Cheng, Z. P. Cai, J. Z. Li, and H. Gao, Extracting kernel dataset from big sensory data in wireless sensor networks, IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 4, pp. 813-827, 2017.
[33]
J. X. Gu, J. L. Wang, L. Zhang, Z. W. Yu, X. Z. Xin, and Y. H. Liu, Spotlight: Hot target discovery and localization with crowdsourced photos, Tsinghua Science and Technology, vol. 25, no. 1, pp. 68-80, 2020.
[34]
S. Kumar and M. Singh, Big data analytics for healthcare industry: Impact, applications, and tools, Big Data Mining and Analytics, vol. 2, no. 1, pp. 48-57, 2019.
[35]
X. Zheng, Z. P. Cai, and Y. S. Li, Data linkage in smart internet of things systems: A consideration from a privacy perspective, IEEE Communications Magazine, vol. 56, no. 9, pp. 55-61, 2018.
[36]
J. Wu and N. Wang, Approximating special social influence maximization problems, Tsinghua Science and Technology, vol. 25, no. 6, pp. 703-711, 2020.
[37]
M. S. Mahmud, J. Z. Huang, S. Salloum, T. Z. Emara, and K. Sadatdiynov, A survey of data partitioning and sampling methods to support big data analysis, Big Data Mining and Analytics, vol. 3, no. 2, pp. 85-101, 2020.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 25 December 2020
Revised: 21 February 2021
Accepted: 12 March 2021
Published: 29 September 2021
Issue date: April 2022

Copyright

© The author(s) 2022

Acknowledgements

This work was partly supported by the National Natural Science Foundation of China (Nos. 61602129, 61872106, and 61772157).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return