Journal Home > Volume 1 , issue 1

Microblogging, a popular social media service platform, has become a new information channel for users to receive and exchange the most up-to-date information on current events. Consequently, it is a crucial platform for detecting newly emerging events and for identifying influential spreaders who have the potential to actively disseminate knowledge about events through microblogs. However, traditional event detection models require human intervention to detect the number of topics to be explored, which significantly reduces the efficiency and accuracy of event detection. In addition, most existing methods focus only on event detection and are unable to identify either influential spreaders or key event-related posts, thus making it challenging to track momentous events in a timely manner. To address these problems, we propose a Hypertext-Induced Topic Search (HITS) based Topic-Decision method (TD-HITS), and a Latent Dirichlet Allocation (LDA) based Three-Step model (TS-LDA). TD-HITS can automatically detect the number of topics as well as identify associated key posts in a large number of posts. TS-LDA can identify influential spreaders of hot event topics based on both post and user information. The experimental results, using a Twitter dataset, demonstrate the effectiveness of our proposed methods for both detecting events and identifying influential spreaders.


menu
Abstract
Full text
Outline
About this article

Event Detection and Identification of Influential Spreaders in Social Media Data Streams

Show Author's information Leilei ShiYan WuLu Liu( )Xiang SunLiang Jiang
School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang 212013, China.
Department of Computing and Mathematics, University of Derby, UK.

Abstract

Microblogging, a popular social media service platform, has become a new information channel for users to receive and exchange the most up-to-date information on current events. Consequently, it is a crucial platform for detecting newly emerging events and for identifying influential spreaders who have the potential to actively disseminate knowledge about events through microblogs. However, traditional event detection models require human intervention to detect the number of topics to be explored, which significantly reduces the efficiency and accuracy of event detection. In addition, most existing methods focus only on event detection and are unable to identify either influential spreaders or key event-related posts, thus making it challenging to track momentous events in a timely manner. To address these problems, we propose a Hypertext-Induced Topic Search (HITS) based Topic-Decision method (TD-HITS), and a Latent Dirichlet Allocation (LDA) based Three-Step model (TS-LDA). TD-HITS can automatically detect the number of topics as well as identify associated key posts in a large number of posts. TS-LDA can identify influential spreaders of hot event topics based on both post and user information. The experimental results, using a Twitter dataset, demonstrate the effectiveness of our proposed methods for both detecting events and identifying influential spreaders.

Keywords:

event detection, microblogging, Hypertext-Induced Topic Search (HITS), Latent Dirichlet Allocation (LDA), identification of influential spreader
Received: 09 September 2017 Accepted: 29 November 2017 Published: 25 January 2018 Issue date: March 2018
References(32)
[1]
Zhou X. M. and Chen L., Event detection over twitter social media streams, VLDB J., vol. 23, no. 3, pp. 381-400, 2014.
[2]
Aldhaheri A. and Lee J., Event detection on large social media using temporal analysis, in Proc. 7th Annu. Computing and Communication Workshop and Conf., Las Vegas, NV, USA, 2017, pp. 1-6.
[3]
Yan P., MapReduce and semantics enabled event detection using social media, J. Artif. Intell. Soft Comput. Res., vol. 7, no. 3, pp. 201-213, 2017.
[4]
Zhou Y. D., Xu H., and Lei L., Event detection based on interactive communication streams in social network, in Proc. 9th EAI Int. Conf. Mobile Multimedia Communications, Xi’an, China, 2016, pp. 54-57.
[5]
Hofmann T., Probabilistic latent semantic indexing, in Proc. 22nd Annu. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Berkeley, CA, USA, 1999, pp. 50-57.
[6]
Hofmann T., Probabilistic latent semantic indexing, in Proc. 22nd Annu. Int. ACM SIGIR Conf. Research and Development in Information Retrieval, Berkeley, CA, USA, 1999, pp. 50-57.
[7]
Blei D. M., Ng A. Y., and Jordan M. I., Latent Dirichlet allocation, J. Mach. Learn. Res., vol. 3, pp. 993-1022, 2003.
[8]
Diao Q. M., Jiang J., Zhu F. D., and Lim E. P., Finding bursty topics from microblogs, in Proc. 50th Annu. Meeting of the Association for Computational Linguistics: Long Papers–Volume 1, Jeju Island, Korea, 2012, pp. 536-544.
[9]
Wang X. H., Zhai C. X., Hu X., and Sproat R., Mining correlated bursty topic patterns from coordinated text streams, in Proc. 13th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Jose, CA, USA, 2007, pp. 784-793.
[10]
AlSumait L., Barbara D., and Domeniconi C., On-Line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking, in Proc. 8th IEEE Int. Conf. Data Mining, Pisa, Italy, 2008, pp. 3-12.
[11]
Li J. X., Tai Z. Y., Zhang R. C., Yu W. R., and Liu L., Online bursty event detection from microblog, in Proc. 7th IEEE/ACM Int. Conf. Utility and Cloud Computing, London, UK, 2014, pp. 865-870.
[12]
Chakrabarti S., Dom B., Raghavan P., Rajagopalan S., Gibson D., and Kleinberg J., Automatic resource compilation by analyzing hyperlink structure and associated text, Comput. Netw. ISDN Syst., vol. 30, nos. 1–7, pp. 65-74, 1998.
[13]
Bao J., Zheng Y., and Mokbel M. F., Location-based and preference-aware recommendation using sparse geo-social networking data, in Proc. 20th Int. Conf. Advances in Geographic Information Systems, Redondo Beach, CA, USA, 2012, pp. 199-208.
[14]
Kleinberg J., Bursty and hierarchical structure in streams, in Proc. 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD ‘02), Edmonton, Canada, 2002, pp. 91-101.
[15]
Yang Y. M., Pierce T., and Carbonell J., A study of retrospective and on–line event detection, in Proc. 21st Annual Int. ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR ‘98), Melbourne, Australia, 1998, pp. 28-36.
[16]
Mathioudakis M. and Koudas N., Twittermonitor: Trend detection over the twitter stream, in Proc. 2010 ACM SIGMOD Int. Conf. Management of Data, Indianapolis, IN, USA, 2010, pp. 1155-1158.
[17]
Allan J., Lavrenko V., Malin D., and Swan R., Detections, bounds, and timelines: UMass and TDT–3, in Proc. Topic Detection and Tracking Workshop, TDT–3, Vienna, Austria, 2000, pp. 167-174.
[18]
Atefeh F. and Khreich W., A survey of techniques for event detection in twitter, Comput. Intell., vol. 31, no, 1, pp. 132-164, 2015.
[19]
Twitter, REST API v1.1 resources, , 2017.
[20]
Facebook, Quickstart for the Azure AD Graph API, , 2017.
[21]
Weng J. S. and Lee B. S., Event detection in Twitter, in Proc. 5th Int. AAAI Conf. Weblogs and Social Media, Barcelona, Spain, 2011, pp. 401-408.
[22]
Li Y. F., Jia C. Y., and Yu J., A parameter–free community detection method based on centrality and dispersion of nodes in complex networks, Phys. A: Stat. Mech. Appl., vol. 438, pp. 321-334, 2015.
[23]
Lü L. Y. and Zhou T., Link prediction in complex networks: A survey, Phys. A: Stat. Mech. Appl., vol. 390, no. 6, pp. 1150-1170, 2011.
[24]
Jaccard P., Étude comparative de la distribution florale dans une portion des Alpes et du Jura, Bulletin del la Société Vaudoise des Sciences Naturelles, vol. 37, no. 142, pp. 547-579, 1901.
[25]
Hu Y. Q., Li M. H., Zhang P., Fan Y., and Di Z. R., Community detection by signaling on complex networks, Phys. Rev. E, vol. 78, no. 1, p. 016115, 2008.
[26]
Asuncion A., Welling M., Smyth P., and Teh Y. W., On smoothing and inference for topic models, in Proc. 25th Conf. Uncertainty in Artificial Intelligence, Montreal, Canada, 2009, pp. 27-34.
[27]
Alhamzawi R. and Yu K. M., Variable selection in quantile regression via Gibbs sampling, J. Appl. Stat., vol. 39, no. 4, pp. 799-813, 2012.
[28]
Sun P. G. and Yang Y., Methods to find community based on edge centrality, Phys. A Stat. Mech. Appl., vol. 392, no. 9, pp. 1977-1988, 2013.
[29]
Campiteli M. G., Holanda A. J., Soares L. D. H., Soles P. R. C., and Kinouchi O., Lobby index as a network centrality measure, Phys. A: Stat. Mech. Appl., vol. 392, no. 21, pp. 5511-5515, 2013.
[30]
Sohn J., Kang D., Park H., Joo B. G., and Chung I. J., An improved social network analysis method for social networks, in Advanced Technologies, Embedded and Multimedia for HumanCentric Computing, Huang Y. M., Chao H. C., Deng D. J., and Park J. J., eds. Amsterdam, The Netherlands: Springer, 2014, pp. 115-123.
[31]
Bonacich P., Factoring and weighting approaches to status scores and clique identification, J. Math. Sociol., vol. 2, no. 1, pp. 113-120, 1972.
[32]
Green O. and Bader D. A., Faster betweenness centrality based on data structure experimentation, Procedia Comput. Sci., vol. 18, pp. 399-408, 2013.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 09 September 2017
Accepted: 29 November 2017
Published: 25 January 2018
Issue date: March 2018

Copyright

© The author(s) 2018

Acknowledgements

The work was supported by the National Natural Science Foundation of China (Nos. 61502209 and 61502207), the Natural Science Foundation of Jiangsu Province of China (No. BK20130528) and Visiting Research Fellow Program of Tongji University (No. 8105142504).

Rights and permissions

Reprints and Permission requests may be sought directly from editorial office.

Return