News Topic Detection Based on Capsule Semantic Graph

Shuang Yang; Yan Tang

doi:10.26599/BDMA.2021.9020023

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (4.8 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

News Topic Detection Based on Capsule Semantic Graph

Shuang Yang, Yan Tang(

)

College of Computer and Information Science, Southwest University, Chongqing 400000, China

Show Author Information

Abstract

Most news topic detection methods use word-based methods, which easily ignore the relationship among words and have semantic sparsity, resulting in low topic detection accuracy. In addition, the current mainstream probability methods and graph analysis methods for topic detection have high time complexity. For these reasons, we present a news topic detection model on the basis of capsule semantic graph (CSG). The keywords that appear in each text at the same time are modeled as a keyword graph, which is divided into multiple subgraphs through community detection. Each subgraph contains a group of closely related keywords. The graph is used as the vertex of CSG. The semantic relationship among the vertices is obtained by calculating the similarity of the average word vector of each vertex. At the same time, the news text is clustered using the incremental clustering method, where each text uses CSG; that is, the similarity among texts is calculated by the graph kernel. The relationship between vertices and edges is also considered when calculating the similarity. Experimental results on three standard datasets show that CSG can obtain higher precision, recall, and F1 values than several latest methods. Experimental results on large-scale news datasets reveal that the time complexity of CSG is lower than that of probabilistic methods and other graph analysis methods.

Keywords

news topic detection capsule semantic graph graph kernel

References

[1]

J. Allan, S. Harding, D. Fisher, A. Bolivar, S. Guzman-Lara, and P. Amstutz, Taking topic detection from evaluation to practice, in Proc. 38^th Annu. Hawaii Int. Conf. on System Sciences, Big Island, HI, USA, 2005, p. 101a.

Google Scholar

[2]

Y. Chen and L. Liu, Development and research of Topic Detection and Tracking, in 2016 7^th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp. 170-173, 2016.

Google Scholar

[3]

L. Hong and B. W. Li, Hot topic detection research of internet public opinion based on affinity propagation clustering, in Computer, Informatics, Cybernetics and Applications: Proceedings of the CICA 2011, X. G. He, E. T. Hua, Y. Lin, and X. Z. Liu, eds. Dordrecht, Netherlands: Springer, 2012, pp. 261-269.

[4]

T. Sakaki, M. Okazaki, and Y. Matsuo, Tweet analysis for real-time event detection and earthquake reporting system development, IEEE Trans. Knowl. Data Eng., vol. 25, no. 4, pp. 919-931, 2013.

Crossref Google Scholar

[5]

X. F. Lu, X. Zhou, W. T. Wang, P. Lio, and P. Hui, Domain-oriented topic discovery based on features extraction and topic clustering, IEEE Access, vol. 8, pp. 93648-93662, 2020.

Crossref Google Scholar

[6]

J. Z. Li, Q. N. Fan, and K. Zhang, Keyword extraction based on tf/idf for Chinese news document, Wuhan Univ.J. Nat. Sci., vol. 12, no. 5, pp. 917-921, 2007.

Crossref Google Scholar

[7]

K. K. Bun and M. Ishizuka, Topic extraction from news archive using TF*PDF algorithm, in Proc. 3^rd Int. Conf. on Web Information Systems Engineering, Singapore, 2002, pp. 73-82.

Google Scholar

[8]

S. Chen and Z. Jin, Weibo topic detection based on improved TF-IDF algorithm. Science & Technology Review, vol. 34, no. 2, pp. 282-286, 2016.

Google Scholar

[9]

R. Mihalcea and P. Tarau, TextRank: Bringing order into text, in Proc. Conf. on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004, pp. 404-411.

Google Scholar

[10]

K. Zhang, J. Zi, and L. G. Wu, New event detection based on indexing-tree and named entity, in Proc. 30^th Annu. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Amsterdam, Netherlands, 2007, pp. 215-222.

Crossref Google Scholar

[11]

M. Pu, F. Zhou, J. J. Zhou, X. Yan, and L. J. Zhou, Topic sentence extraction of key news events based on weighted textrank, (in Chinese), Comput. Eng., vol. 43, no. 8, pp. 219-224, 2017.

Google Scholar

[12]

X. T. Qu, J. Yang, B. Wu, and H. M. Xin, A news event detection algorithm based on key elements recognition, in Proc. 2016 IEEE 1^st Int. Conf. on Data Science in Cyberspace (DSC), Changsha, China, 2016, pp. 394-399.

Crossref Google Scholar

[13]

Z. Y. Chen and B. Liu, Mining topics in documents: Standing on the shoulders of big data, in Proc. 20^th ACM SIGKDD Int. Conf. on Knowledge Discovery And Data Mining, New York, NY, USA, 2014, pp. 1116-1125.

Crossref Google Scholar

[14]

L. Q. Qiu, H. Y. Liu, X. Fan, and W. Jia, Hot topic detection based on VSM and improved LDA hybrid model, in Proc. 12^th Int. Conf. on Genetic and Evolutionary Computing, Changzhou, China, 2019, pp. 583-593.

Crossref Google Scholar

[15]

H. Sayyadi and L. Raschid, A graph analytical approach for topic detection, ACM Trans. Internet Technol., vol. 13, no. 2, p. 4, 2013.

Crossref Google Scholar

[16]

T. T. Zhang, B. Lee, Q. H. Zhu, X. Han, and E. M. Ye, Multi- dimension topic mining based on hierarchical semantic graph model, IEEE Access, vol. 8, pp. 64820-64835, 2020.

Crossref Google Scholar

[17]

A. Hamm, J. Thelen, R. Beckmann, and S. Odrowski, TeCoMiner: Topic discovery through term community detection, arXiv preprint arXiv: 2103.12882, 2021.

Google Scholar

[18]

M. N. Azadani, N. Ghadiri, and E. Davoodijam, Graph-based biomedical text summarization: An itemset mining and sentence clustering approach. Journal of Biomedical Informatics, vol. 84, pp. 42-58, 2018

Crossref Google Scholar

[19]

B. Drury, C. Rocha, M.-F. Moura, and A. Lopes, The extraction from news stories a causal topic centred bayesian graph for sugarcane, in Proceedings of the 20th International Database Engineering & Applications Symposium, Montreal, Canada, pp. 364-369, 2016.

Crossref Google Scholar

[20]

U. Kang, H. H. Tong, and J. M. Sun, Fast random walk graph kernel, in Proceedings of the 12^th SIAM international conference on data mining (SDM), Los Angeles, CA, USA, pp. 828-838, 2012.

Crossref Google Scholar

[21]

N. Shervashidze and K. M. Borgwardt, Fast subtree kernels on graphs, in Proceedings of the Conference on Advances in Neural Information Processing Systems, Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, eds. Red Hook, NY, USA: Curran Associates Inc., pp. 1660-1668, 2009.

Google Scholar

[22]

G. Nikolentzos, P. Meladianos, F. Rousseau, M. Vazirgiannis, and Y. Stavrakas, Shortest-path graph kernels for document similarity, in Proc. 2017 Conf. on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 1890-1900.

Crossref Google Scholar

[23]

M. E. J. Newman, Detecting community structure in networks, Eur. Phys. J. B, vol. 38, no. 2, pp. 321-330, 2004.

Crossref Google Scholar

[24]

T. Mikolov, I. Sutskever, C. Kai, G. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, in Proc. 26^th Int. Conf. on Neural Information Processing Systems, Lake Tahoe, NV, USA, 2013, pp. 3111-3119.

Google Scholar

[25]

X. J. Zhang, Z. T. Liu, W. Liu, J. H. Yang, and S. N. Fei, Chinese event classification for event ontology construction, J. Comput. Inf. Syst., vol. 9, no. 9, pp. 3511-3519, 2013.

Google Scholar

[26]

M. S. Sun, J. Y. Li, Z. P. Guo, Y. Zhao, Y. B. Zheng, X. C. Si, and Z. Y. Liu, THUCTC: An efficient Chinese text classifier, (in Chinese), https://github.com/diuzi/THUCTC, 2016.

[27]

J. G. Fiscus and G. R. Doddington, Topic detection and tracking evaluation overview, in Topic Detection and Tracking: Event-Based Information Organization, Dordrecht, Netherlands: Kluwer Academic Publishers, 2002, pp. 17-31.

Crossref

[28]

J. Allan, R. Papka, V. Lvrenko, On-line new event detection and tracking, http://omega.sp.susu.ru/books/acm_sigmod/vol2/is3/SIGIR1998/P037.pdf, 2017.

Crossref

[29]

P. P. Zhou, Z. Cao, B. Wu, C. Z. Wu, and S. Q. Yu, EDM- JBW: A novel event detection model based on JS-ID’Forder and Bikmeans with word embedding for news streams, J. Comput. Sci., vol. 28, pp. 336-342, 2018.

Crossref Google Scholar

[30]

E. Rasouli, S. Zarifzadeh, and A. J. Rafsanjani, WebKey: A graph-based method for event detection in web news, J. Intell. Inf. Syst., vol. 54, no. 3, pp. 585-604, 2020.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 5 Issue 2,
June 2022

Pages 98-109

DOI: 10.26599/BDMA.2021.9020023

Cite this article:

Yang S, Tang Y. News Topic Detection Based on Capsule Semantic Graph. Big Data Mining and Analytics, 2022, 5(2): 98-109. https://doi.org/10.26599/BDMA.2021.9020023

1494

Views

1090

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 05 November 2021

Accepted: 18 November 2021

Published: 25 January 2022

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).