Journal Home > Volume 3 , issue 2

Microblogs, such as facebook and twitter, have much attention among the users and organizations. Nowadays, twitter is more popular because of its real-time nature. People often interacted with real-time events such as earthquakes and floods through twitter. During a disaster, the number of posts or tweets is drastically increased in twitter. At the time of the disaster, detecting a target event is a challenging task. In this paper, a framework is proposed for observing the tweets and to detect the target event. For detecting the target event, a classifier is devised based on different combinations of statistical features such as the position of the keyword in a tweet, length of a tweet, the frequency of hashtag, and frequency of user mentions and the URL. From the result, it is evident that the combination of frequency of hashtag and position of keyword features provides good classification results than the other combinations of features. Hence, usage of two features, namely, frequency of hashtag and position of the earthquake keyword reduces the event’s detection time. And also these two features are further helpful for detecting the sub-events which are used for filtering the tweets related to the disaster. Additionally, different classifiers such as Artificial Neural Networks (ANN), decision tree, and K-Nearest Neighbor (KNN) are compared by using these two features. However, Support Vector Machine (SVM) with linear kernel by using the combination of position of earthquake keyword and frequency of hashtag outperforms state-of-the-art methods. Therefore, SVM (linear kernel) with proposed features is applied for detecting the earthquake during disaster. The proposed algorithm is tested on Nepal earthquake and landslide datasets, 2015.


menu
Abstract
Full text
Outline
About this article

Comparative Study of Statistical Features to Detect the Target Event During Disaster

Show Author's information Madichetty Sreenivasulu( )M. Sridevi
Department of CSE, National Institute of Technology, Tiruchirappalli, Tamilnadu 620015, India.

Abstract

Microblogs, such as facebook and twitter, have much attention among the users and organizations. Nowadays, twitter is more popular because of its real-time nature. People often interacted with real-time events such as earthquakes and floods through twitter. During a disaster, the number of posts or tweets is drastically increased in twitter. At the time of the disaster, detecting a target event is a challenging task. In this paper, a framework is proposed for observing the tweets and to detect the target event. For detecting the target event, a classifier is devised based on different combinations of statistical features such as the position of the keyword in a tweet, length of a tweet, the frequency of hashtag, and frequency of user mentions and the URL. From the result, it is evident that the combination of frequency of hashtag and position of keyword features provides good classification results than the other combinations of features. Hence, usage of two features, namely, frequency of hashtag and position of the earthquake keyword reduces the event’s detection time. And also these two features are further helpful for detecting the sub-events which are used for filtering the tweets related to the disaster. Additionally, different classifiers such as Artificial Neural Networks (ANN), decision tree, and K-Nearest Neighbor (KNN) are compared by using these two features. However, Support Vector Machine (SVM) with linear kernel by using the combination of position of earthquake keyword and frequency of hashtag outperforms state-of-the-art methods. Therefore, SVM (linear kernel) with proposed features is applied for detecting the earthquake during disaster. The proposed algorithm is tested on Nepal earthquake and landslide datasets, 2015.

Keywords:

disaster, twitter, Support Vector Machine (SVM), statical features
Received: 05 November 2019 Accepted: 21 November 2019 Published: 27 February 2020 Issue date: June 2020
References(41)
[1]
Z. C. Miao, K. Chen, Y. Fang, J. H. He, Y. Zhou, W. J. Zhang, and H. Y. Zha, Cost-effective online trending topic detection and popularity prediction in microblogging, ACM Trans. Inf. Syst., vol. 35, no. 3, p. 18, 2017.
[2]
N. Pervin, F. Fang, A. Datta, K. Dutta, and D. Vandermeer, Fast, scalable, and context-sensitive detection of trending topics in microblog post streams, ACM Trans. Manage. Inf. Syst., vol. 3, no. 4, p. 19, 2013.
[3]
M. Sreenivasulu and M. Sridevi, A survey on event detection methods on various social media, in Recent Findings in Intelligent Computing Techniques, P. K. Sa, S. Bakshi, I. K. Hatzilygeroudis, and M. N. Sahoo, eds. Singapore: Springer, 2018, pp. 87-93.
[4]
H. Kwak, C. Lee, H. Park, and S. Moon, What is twitter, a social network or a news media? in Proc. 19th Int. Conf. World Wide Web, Raleigh, NC, USA, 2010, pp. 591-600.
[5]
M. Imran, P. Mitra, and C. Castillo, Twitter as a lifeline: Human-annotated twitter corpora for NLP of crisis-related messages, arXiv preprint arXiv: 1605.05894, 2016.
[6]
M. Imran, C. Castillo, J. Lucas, P. Meier, and S. Vieweg, AIDR: Artificial intelligence for disaster response, in Proceedings of the 23rd International Conference on World Wide Web, Seoul, Korea, 2014, pp. 159-162.
[7]
M. Imran, C. Castillo, F. Diaz, and S. Vieweg, Processing social media messages in mass emergency: A survey, ACM Comput. Surv., vol. 47, no. 4, p. 67, 2015.
[8]
S. Vieweg, C. Castillo, and M. Imran, Integrating social media communications into the rapid assessment of sudden onset disasters, in Proc. 6th Int. Conf. Social Informatics, Barcelona, Spain, 2014, pp. 444-461.
[9]
S. Madisetty and M. S. Desarkar, A neural network-based ensemble approach for spam detection in Twittern, IEEE Trans. Comput. Social Syst., vol. 5, no. 4, pp. 973-984, 2018.
[10]
T. Sakaki, M. Okazaki, and Y. Matsuo, Tweet analysis for real-time event detection and earthquake reporting system development, IEEE Trans. Knowl. Data Eng., vol. 25, no. 4, pp. 919-931, 2013.
[11]
B. Takahashi, E. C. Jr. Tandoc, and C. Carmichael, Communicating on twitter during a disaster: An analysis of tweets during Typhoon Haiyan in the Philippines, Comput. Human Behav., vol. 50, pp. 392-398, 2015.
[12]
K. Rudra, S. Banerjee, N. Ganguly, P. Goyal, M. Imran, and P. Mitra, Summarizing situational tweets in crisis scenario, in Proc. 27th ACM Conf. Hypertext and Social Media, Halifax, Canada, 2016, pp. 137-147.
[13]
K. Rudra, S. Ghosh, N. Ganguly, P. Goyal, and S. Ghosh, Extracting situational information from microblogs during disaster events: A classification-summarization approach, in Proc. 24th ACM Int. Conf. Information and Knowledge Management, Melbourne, Australia, 2015, pp. 583-592.
[14]
S. Verma, S. Vieweg, W. J. Corvey, L. Palen, J. H. Martin, M. Palmer, A. Schram, and K. M. Anderson, Natural language processing to the rescue? extracting “situational awareness” tweets during mass emergency, in Proc. 5th Int. Conf. Weblogs and Social Media, Barcelona, Spain, 2011, pp. 385-392.
[15]
T. H. Nazer, F. Morstatter, H. Dani, and H. Liu, Finding requests in social media for disaster relief, in Proc. 2016 IEEE/ACM Int. Conf. Advances in Social Networks Analysis and Mining, Davis, CA, USA, 2016, pp. 1410-1413.
[16]
S. R. Chowdhury, M. Imran, M. R. Asghar, S. Amer-Yahia, and C. Castillo, Tweet4act: Using incident-specific profiles for classifying crisis-related messages, in Proc. 10th Int. ISCRAM Conf., Baden-Baden, Germany, 2013, pp. 1-5.
[17]
M. Sreenivasulu and M. Sridevi, Mining informative words from the tweets for detecting the resources during disaster, in Proc. 5th Int. Conf. Mining Intelligence and Knowledge Exploration, Hyderabad, India, 2017, pp. 348-358.
[18]
M. Basu, K. Ghosh, S. Das, R. Dey, S. Bandyopadhyay, and S. Ghosh, Identifying post-disaster resource needs and availabilities from microblogs, in Proc. 2017 IEEE/ACM Int. Conf. Advances in Social Networks Analysis and Mining, Sydney, Australia, 2017, pp. 427-430.
[19]
P. Khosla, M. Basu, K. Ghosh, and S. Ghosh, Microblog retrieval for post-disaster relief: Applying and comparing neural IR models, arXiv preprint arXiv: 1707.06112, 2017.
[20]
M. Sreenivasulu and M. Sridevi, Re-ranking feature selection algorithm for detecting the availability and requirement of resources tweets during disaster, International Journal of Computational Intelligence & IoT, vol. 1, no. 2, pp. 207-211, 2018.
[21]
M. Ikonomakis, S. Kotsiantis, and V. Tampakas, Text classification using machine learning techniques, WSEAS Trans. Comput., vol. 4, no. 8, pp. 966-974, 2005.
[22]
E. H. Han, G. Karypis, and V. Kumar, Text categorization using weight adjusted k-nearest neighbor classification, in Proc. 5th Pacific-Asia Conf. Knowledge Discovery and Data Mining, Hong Kong, China, 2001, pp. 53-65.
[23]
J. He, A. H. Tan, and C. L. Tan, On machine learning methods for Chinese document categorization, Appl. Intell., vol. 18, no. 3, pp. 311-322, 2003.
[24]
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Belmont, CA, USA: Wadsworth, 1984.
[25]
J. N. Morgan and J. A. Sonquist, Problems in the analysis of survey data, and a proposal, J. Am. Stat. Assoc., vol. 58, no. 302, pp. 415-434, 1963.
[26]
J. R. Quinlan, Induction of decision trees, Mach. Learn., vol. 1, no. 1, pp. 81-106, 1986.
[27]
J. R. Quinlan, C 4.5: Programs for Machine Learning. Amsterdam, Netherlands: Elsevier, 2014.
[28]
W. S. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bull. Math. Biophys., vol. 5, no. 4, pp. 115-133, 1943.
[29]
F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychol. Rev., vol. 65, no. 6, pp. 386-408, 1958.
[30]
P. Werbos, Beyond regression: New tools for prediction and analysis in the behavior science, Ph.D. dissertation, Harvard University, Cambridge, MA, USA, 1974.
[31]
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation, Technical report, University of California, San Diego, CA, USA, 1985.
[32]
K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Netw., vol. 2, no. 5, pp. 359-366, 1989.
[33]
R. Gutierrez-Osuna, CS 790: Selected Topics in Computer Science: Introduction to Pattern Recognition. Dayton, OH, USA: Wright State University, 2002.
[34]
T. Joachims, Text categorization with support vector machines: Learning with many relevant features, in Proc. 10th European Conf. Machine Learning, Chemnitz, Germany, 1998, pp. 137-142.
[35]
S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter, Pegasos: Primal estimated sub-gradient solver for SVM, Mathematical Programming, vol. 127, no. 1, pp. 3-30, 2011.
[36]
C. J. Hsieh, K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan, A dual coordinate descent method for large-scale linear SVM, in Proc. 25th Int. Conf. Machine Learning, Helsinki, Finland, 2008, pp. 408-415.
[37]
I. W. Tsang, J. T. Kwok, and P. M. Cheung, Core vector machines: Fast SVM training on very large data sets, J. Mach. Learn. Res., vol. 6, pp. 363-392, 2005.
[38]
A. Rahimi and B. Recht, Random features for large-scale kernel machines, in Proc. 20th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2008, pp. 1177-1184.
[39]
C. W. Hsu and C. J. Lin, A comparison of methods for multiclass support vector machines, IEEE Trans. Neural Netw., vol. 13, no. 2, pp. 415-425, 2002.
[40]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., vol. 12, pp. 2825-2830, 2011.
[41]
M. Imran, P. Mitra, and C. Castillo, Twitter as a lifeline: Human-annotated twitter corpora for NLP of crisis-related messages, in Proc. 10th Int. Conf. Language Resources and Evaluation, Paris, France, 2016.
Publication history
Copyright
Rights and permissions

Publication history

Received: 05 November 2019
Accepted: 21 November 2019
Published: 27 February 2020
Issue date: June 2020

Copyright

© The author(s) 2020

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Reprints and Permission requests may be sought directly from editorial office.

Return