Trajectory Big Data Processing Based on Frequent Activity

Amina Belhassena; Hongzhi Wang

doi:10.26599/TST.2018.9010087

Tsinghua Science and Technology 2019, 24(3): 317-332 https://doi.org/10.26599/TST.2018.9010087

Open Access | Issue | Published: 24 January 2019

Trajectory Big Data Processing Based on Frequent Activity

Show Author's Information Hide Author's Information Amina Belhassena(

), Hongzhi Wang

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.

Keywords:

distributed R-tree, trajectory, frequent activity, query

Cite this article:

Belhassena A, Wang H. Trajectory Big Data Processing Based on Frequent Activity. Tsinghua Science and Technology, 2019, 24(3): 317-332. https://doi.org/10.26599/TST.2018.9010087

Download citation

EndNote(RIS)

BibTeX

470

Views

Downloads

Citations

Crossref

N/A

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

With the rapid development and wide use of Global Positioning System in technology tools, such as smart phones and touch pads, many people share their personal experience through their trajectories while visiting places of interest. Therefore, trajectory query processing has emerged in recent years to help users find their best trajectories. However, with the huge amount of trajectory points and text descriptions, such as the activities practiced by users at these points, organizing these data in the index becomes tedious. Therefore, the parallel method becomes indispensable. In this paper, we have investigated the problem of distributed trajectory query processing based on the distance and frequent activities. The query is specified by start and final points in the trajectory, the distance threshold, and a set of frequent activities involved in the point of interest of the trajectory. As a result, the query returns the shortest trajectory including the most frequent activities with high support and high confidence. To simplify the query processing, we have implemented the Distributed Mining Trajectory R-Tree index (DMTR-Tree). For this method, we initially managed the large trajectory dataset in distributed R-Tree indexes. Then, for each index, we applied the frequent itemset Apriori algorithm for each point to select the frequent activity set. For the faster computation of the above algorithms, we utilized the cluster computing framework of Apache Spark with MapReduce as the programing model. The experimental results show that the DMTR-Tree index and the query-processing algorithm are efficient and can achieve the scalability.

Full text

Abstract

Full text

Outline

About this article

Trajectory Big Data Processing Based on Frequent Activity

Show Author's information Hide Author's Information Amina Belhassena(

), Hongzhi Wang

School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China.

Abstract

Keywords: distributed R-tree, trajectory, frequent activity, query

References(37)

[1]

Aung H. H., Guo L., and Tan K. L., Mining sub-trajectory cliques to find frequent routes, in Proc. 13th Int. Symp. Advances in Spatial and Temporal Databases, Munich, Germany, 2013, pp. 92-109.

DOI

[2]

Zheng K., Shang S., Yuan N. J., and Yang Y., Towards efficient search for activity trajectories, in Proc. 29th Int. Conf. Data Engineering, Brisbane, Australia, 2013, pp. 230-241.

DOI

[3]

Zhang C., Han J. W., Shou L. D., Lu J. J., and La Porta T., Splitter: Mining fine-grained sequential patterns in semantic trajectories, Proc. VLDB Endow., vol. 7, no. 9, pp. 769-780, 2014.

DOI Google Scholar

[4]

Chen W., Zhao L., Xu J. J., Liu G. F., Zheng K., and Zhou X. F., Trip oriented search on activity trajectory, J. Comput. Sci. Technol., vol. 30, no. 4, pp. 745-761, 2015.

DOI Google Scholar

[5]

Agrawal R. and Srikant R., Fast algorithms for mining association rules in large databases, in Proc. 20th Int. Conf. Very Large Data Bases, Santiago de Chile, Chile, 1994, pp. 487-499.

[6]

Bayardo R. J. Jr., Efficiently mining long patterns from databases, ACM SIGMOD Rec., vol. 27, no. 2, pp. 85-93, 1998.

DOI Google Scholar

[7]

Zaki M. J., Parthasarathy S., Ogihara M., and Li W., Parallel algorithms for discovery of association rules, Data Min. Knowl. Disc., vol. 1, no. 4, pp. 343-373, 1997.

DOI Google Scholar

[8]

Qiu H. J., Gu R., Yuan C. F., and Huang Y. H., YAFIM: A parallel frequent itemset mining algorithm with spark, in Proc. 28th Int. Parallel & Distributed Processing Symp. Workshops, Phoenix, AZ, USA, 2014.

DOI

[9]

Guttman A., R-trees: A dynamic index structure for spatial searching, in Proc. 1984 ACM SIGMOD Int. Conf. Management of Data, Boston, MA, USA, 1984, pp. 47-57.

DOI

[10]

Dean J. and Ghemawat S., MapReduce: Simplified data processing on large clusters, in Proc. 6th Conf. Symp. Operating Systems Design & Implementation, San Francisco, CA, USA, 2004.

[11]

Eldawy A., Alarabi L., and Mokbel M. F., Spatial partitioning techniques in Spatial Hadoop, in Proc. 41st Int. Conf. Very Large Data Bases, Kohala Coast, HI, USA, 2015, pp. 1602-1605.

DOI

[12]

Yang H. C., Dasdan A., Hsiao R. L., and Parker D. S., Map-reduce-merge: Simplified relational data processing on large clusters, in Proc. ACM SIGMOD Int. Conf. Management of Data, Beijing, China, 2007, pp. 1029-1040.

DOI

[13]

Wang H. Z. and Belhassena A., Parallel trajectory search based on distributed index, Inf. Sci., vols. 388&389, pp. 62-83, 2017.

DOI Google Scholar

[14]

Agrawal R. and Srikant R., Mining sequential patterns, in Proc. 11th Int. Conf. Data Engineering, Taipei, China, 1995.

[15]

Morzy M., Prediction of moving object location based on frequent trajectories, in Proc. 21st Int. Conf. Computer and Information Sciences, Istanbul, Turkey, 2006, pp. 583-592.

DOI

[16]

Morzy M., Mining frequent trajectories of moving objects for location prediction, in Proc. 5th Int. Conf. Machine Learning and Data Mining in Pattern Recognition, Leipzig, Germany, 2007, pp. 18-20.

[17]

Masciari E., Shi G., and Zaniolo C., Sequential pattern mining from trajectory data, in Proc. 17th Int. Database Engineering Applications Symp., Barcelona, Spain, 2013, pp. 162-167.

DOI

[18]

Monreale A., Pinelli F., Trasarti R., and Giannotti F., WhereNext: A location predictor on trajectory pattern mining, in Proc. 15th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Paris, France, 2009.

DOI

[19]

Li N., Zeng L., He Q., and Shi Z. Z., Parallel implementation of apriori algorithm based on MapReduce, in Proc. 13th ACIS Int. Conf. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, Kyoto, Japan, 2012, pp. 236-241.

DOI

[20]

Lin M. Y., Lee P. Y., and Hsueh S. C., Apriori-based frequent itemset mining algorithms on MapReduce, in Proc. 16th Int. Conf. Ubiquitous Information Management and Communication, Kuala Lumpur, Malaysia, 2012.

DOI

[21]

Guo J. and Ren Y. G., Research on improved a priori algorithm based on coding and MapReduce, in Proc. 10th Web Information System and Application Conf., Yangzhou, China, 2013, pp. 294-299.

DOI

[22]

Zaki M. J., Parthasarathy S., Ogihara M., and Li W., New algorithms for fast discovery of association rules, in Proc. 3rd Int. Conf. Knowledge Discovery and Data Mining, Newport Beach, CA, USA, 1997.

DOI

[23]

Li N., Zang L., He Q., and Shi Z. Z., Parallel implementation of apriori algorithm based on MapReduce, Int. J. Netw. Distrib. Comput., vol. 1, no. 2, pp. 89-96, 2013.

DOI Google Scholar

[24]

Tong W., Rudin C., Wagner D., and Sevieri R., Learning to detect patterns of crime, in Proc. European Conf. Machine Learning and Knowledge Discovery in Databases, Prague, Czech Republic, 2013, pp. 515-530.

DOI

[25]

Zaharia M., Chowdhury M., Franklin M. J., Shenker S., and Stoica I., Spark: Cluster computing with working sets, in Proc. 2nd USENIX Conf. Hot Topics in Cloud Computing, Boston, MA, USA, 2010.

[26]

Rathee S., Kaul M., and Kashyap A., R-Apriori: An efficient apriori based algorithm on spark, in Proc. 8th Workshop on Ph.D. Workshop in Information and Knowledge Management, Melbourne, Australia, 2015, pp. 27-34.

DOI

[27]

Beckmann N., Kriegel H. P., Schneider R., and Seeger B., The R∗-tree: An efficient and robust access method for points and rectangles, in Proc. 1990 ACM SIGMOD Int. Conf. Management of Data, Atlantic City, NJ, USA, 1990, pp. 322-331.

DOI

[28]

Hariharan R., Hore B., Li C., and Mehrotra S., Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems, in Proc. 19th Int. Conf. Scientific and Statistical Database Management, Banff, Canada, 2007.

DOI

[29]

Zhang D. X., Chee Y. M., Mondal A., Tung A. K. H., and Kitsuregawa M., Keyword search in spatial databases: Towards searching by document, in Proc. 25th Int. Conf. Data Engineering, Shanghai, China, 2009, pp. 688-699.

DOI

[30]

Chen W., Zhao L., Xu J. J., Zheng K., and Zhou X. F., Ranking based activity trajectory search, in Proc. 15th Int. Conf. Web Information Systems Engineering, Thessaloniki, Greece, 2014, pp. 170-185.

DOI

[31]

Du Mouza C., Litwin W., and Rigaux P., SD-Rtree: A scalable distributed R-tree, in Proc. 23rd Int. Conf. Data Engineering, Istanbul, Turkey, 2007, pp. 296-305.

DOI

[32]

Eldawy A. and Mokbel M. F., A demonstration of Spatial Hadoop: An efficient MapReduce framework for spatial data, Proc. VLDB Endow, vol. 6, no. 12, pp. 1230-1233, 2013.

DOI Google Scholar

[33]

Wang L., Chen B., and Liu Y. H., Distributed storage and index of vector spatial data based on HBase, in Proc. 21st Int. Conf. Geoinformatics, Kaifeng, China, 2013, pp. 1-5.

DOI

[34]

Yu J., Wu J. X., and Sarwat M., GeoSpark: A cluster computing framework for processing large-scale spatial data, in Proc. 23rd SIGSPATIAL Int. Conf. Advances in Geographic Information Systems, Seattle, WA, USA, 2015.

DOI

[35]

Lee J. G., Han J. W., and Whang K. Y., Trajectory clustering: A partition-and-group framework, in Proc. 2007 ACM SIGMOD Int. Conf. Management of Data, Beijing, China, 2007, pp. 593-604.

DOI

[36]

Han J. W., Pei J., and Yin Y. W., Mining frequent patterns without candidate generation, in Proc. 2000 ACM SIGMOD Int. Conf. Management of Data, Dallas, TX, USA, 2000, pp. 1-12.

DOI

[37]

Ester M., Kriegel H. P., Sander J., and Xu X. W., A density-based algorithm for discovering clusters in large spatial databases with noise, in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining, Portland, OR, USA, 1996, pp. 226-231.

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 22 November 2017

Accepted: 22 February 2018

Published: 24 January 2019

Issue date: June 2019

Copyright

Acknowledgements

This paper was partially supported by the National Natural Science Foundation of China (Nos. U1509216 and 61472099), the National Sci-Tech Support Plan (No. 2015BAH10F01), the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Provience (No. LC2016026), and MOECMicrosoft Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology.