School of Computer Science and Telecommunication Engineering and Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Jiangsu University, Zhenjiang 212013, China
School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK
Show Author Information
Hide Author Information
Abstract
Online social networks are increasingly connecting people around the world. Influence maximization is a key area of research in online social networks, which identifies influential users during information dissemination. Most of the existing influence maximization methods only consider the transmission of a single channel, but real-world networks mostly include multiple channels of information transmission with competitive relationships. The problem of influence maximization in an environment involves selecting the seed node set for certain competitive information, so that it can avoid the influence of other information, and ultimately affect the largest set of nodes in the network. In this paper, the influence calculation of nodes is achieved according to the local community discovery algorithm, which is based on community dispersion and the characteristics of dynamic community structure. Furthermore, considering two various competitive information dissemination cases as an example, a solution is designed for self-interested information based on the assumption that the seed node set of competitive information is known, and a novel influence maximization algorithm of node avoidance based on user interest is proposed. Experiments conducted based on real-world Twitter dataset demonstrates the efficiency of our proposed algorithm in terms of accuracy and time against notable influence maximization algorithms.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
L. L.Shi, Y.Wu, L.Liu, X.Sun, and L.Jiang, Event detection and identification of influential spreaders in social media data streams, Big Data Mining and Analytics, vol. 1, no. 1, pp. 34-46, 2018.
L. L.Shi, L.Liu, Y.Wu, L.Jiang, M.Kazim, H.Ali, and J.Panneerselvam, Human-centric cyber social computing model for hot-event detection and propagation, IEEE Trans. Comput. Soc. Syst., vol. 6, no. 5, pp. 1042-1050, 2019.
H.Lu, S. X.Liu, H.Wei, and J. J.Tu, Multi-kernel fuzzy clustering based on auto-encoder for fMRI functional network, Expert Syst. Appl., vol. 159, p. 113513, 2020.
A.Monney, Y. Z.Zhan, Z.Jiang, and B. B.Benuwa, A multi-kernel method of measuring adaptive similarity for spectral clustering, Expert Syst. Appl., vol. 159, p. 113570, 2020.
S. N.Tang, S. Q.Yuan, and Y.Zhu, Data preprocessing techniques in convolutional neural network based on fault diagnosis towards rotating machinery, IEEE Access, vol. 8, pp. 149487-149496, 2020.
L. L.Shi, L.Liu, Y.Wu, L.Jiang, J.Panneerselvam, and R.Crole, A social sensing model for event detection and user influence discovering in social media data streams, IEEE Trans. Comput. Soc. Syst., vol. 7, no. 1, pp. 141-150, 2020.
G.Palla, I.Derényi, I.Farkas, and T.Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature, vol. 435, no. 7043, pp. 814-818, 2005.
S. Y.Shih, M.Lee, and C. C.Chen, An effective friend recommendation method using learning to rank and social influence, in Proc. PACIS 2015, Singapore, 2015, pp. 242-250.
P.Kim and S.Kim, Detecting overlapping and hierarchical communities in complex network using interaction-based edge clustering, Phys. A Stat. Mech. Appl., vol. 417, pp. 46-56, 2015.
J.Ge, L. L.Shi, Y.Wu, and J.Liu, Human-driven dynamic community influence maximization in social media data streams, IEEE Access, vol. 8, pp. 162238-162251, 2020.
P.Domingos and M.Richardson, Mining the network value of customers, in Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2001, pp. 57-66.
D.Kempe, J.Kleinberg, and É.Tardos, Maximizing the spread of influence through a social network, in Proc. 9th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Washington, DC, USA, 2003, pp. 137-146.
C.Wang, W.Chen, and Y. J.Wang, Scalable influence maximization for independent cascade model in large-scale social networks, Data Min. Knowl. Discov., vol. 25, no. 3, pp. 545-576, 2012.
J. T.Tian, Y. T.Wang, X. J.Feng, A new hybrid algorithm for influence maximization in social networks, (in Chinese), Chin. J. Comput., vol. 10, no. 3, pp. 1956-1965, 2011.
A.Beutel, B. A.Prakash, R.Rosenfeld, and C.Faloutsos, Interacting viruses in networks: Can both survive? in Proc. 18th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Beijing, China, 2012, pp. 426-434.
J. M.Chen, L. L.Shi, L.Liu, A. O.Ayorinde, R. B.Zhu, and J.Panneerselvam, User interest communities influence maximization in a competitive environment, in Proc. 2020 16th Int. Conf. on Mobility, Sensing and Networking, Tokyo, Japan, 2020, pp. 614-621.
S.Bharathi, D.Kempe, and M.Salek, Competitive influence maximization in social networks, in Proc. 3rd Int. Workshop on Web and Internet Economics, Berlin, Germany, 2007, pp. 306-311.
A.Bozorgi, S.Samet, J.Kwisthout, and T.Wareham, Community-based influence maximization in social networks under a competitive linear threshold model, Knowl. Based Syst., vol. 134, pp. 149-158, 2017.
Y. C.Wei and C. K.Cheng, Ratio cut partitioning for hierarchical designs, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., vol. 10, pp. 911-921, 1991.
L.Jiang, L. L.Shi, L.Liu, J. J.Yao, B.Yuan, and Y. J.Zheng, An efficient evolutionary user interest community discovery model in dynamic social networks for internet of people, IEEE Internet Things J., vol. 6, no. 6, pp. 9226-9236, 2019.
S. C.Liu, F. X.Zhu, and L.Gan, A label-propagation-probability-based algorithm for overlapping community detection, (in Chinese), Chin. J. Comput., vol. 39, no. 4, pp. 717-729, 2016.
Y.Zhu, D.Li, and Z.Zhang, Minimum cost seed set for competitive social influence, in Proc. of IEEE International Conference on Computer Communications, San Francisco, CA, USA, .
B.Ribeiro and C.Faloutsos, Modeling website popularity competition in the attention-activity marketplace, in Proc. 8th ACM Int. Conf. on Web Search and Data Mining, Shanghai, China, 2015, pp. 389-398.
A.Beutel, B. A.Prakash, R.Rosenfeld, and C.Faloutsos, Interacting viruses in networks: Can both survive? in Proc. 18th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Beijing, China, 2012, pp. 426-434.
C. W.Tsai, Y. C.Yang, and M. C.Chiang, A genetic NewGreedy algorithm for influence maximization in social network, in Proc. 2015 IEEE Int. Conf. on Systems, Man, and Cybernetics, Hong Kong, China, 2016, pp. 2549-2554.
L. L.Shi, L.Liu, Y.Wu, L.Jiang, and J.Hardy, Event detection and user interest discovering in social media data streams, IEEE Access, vol. 5, pp. 20953-20964, 2017.
N.Trivedi and A.Singh, Efficient influence maximization in social-networks under independent cascade model, Proced. Comput. Sci., vol. 173, pp. 315-324, 2020.
Tong J, Shi L, Liu L, et al. A Novel Influence Maximization Algorithm for a Competitive Environment Based on Social Media Data Analytics. Big Data Mining and Analytics, 2022, 5(2): 130-139. https://doi.org/10.26599/BDMA.2021.9020024
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
10.26599/BDMA.2021.9020024.F001
Nodes and edges in social network.
10.26599/BDMA.2021.9020024.F002
Schematic diagram of the iterative process of the relationship evaluation algorithm.
3 Local Community Discovery Algorithm Based on Community Dispersion
As we can see from
Fig. 1
, Social networks often include a large number of low-quality and invalid data and nodes. Accurately mapping all these nodes in the network generates a lot of computation overheads and further reduces the efficiency. Therefore, we propose to use the Hyperlink-Induced Topic Search (HITS) based relationship evaluation algorithm to filter out high authorities and hubs.
(1) Data preprocessing based on HITS
The algorithm based on HITS iteratively calculates the centrality of each user and the authority of each post, to obtain high authorities and hubs.
Figure 2
shows the link relationship between users and posts and the iteration process. The process is described as follows:
10.26599/BDMA.2021.9020024.F002
Schematic diagram of the iterative process of the relationship evaluation algorithm.
(1) Initialize users’ centrality and post authority;
(2) The centrality of the users is the sum of the authority of all the linked posts, and the authority of the post is the sum of the centrality of all the linked users. Calculate users’ centrality and post authority in the follwing:
(3) The users’ centrality and post authority are normalized,
(4) Evaluate whether the users’ centrality degree and the post authority degree converge, the convergence will output the result; otherwise continue with Eqs. (
2
)-(
4
),
where and denote the thresholds after normalization of the posts’ authority and users’ hub, respectively.
Finally, the users’ centrality and post authority are sorted in a descending fashion (high to low) according to the numerical results, and the parameters and are defined to represent the critical value of screening, which is used to extract users’ interest labels, and other information are filtered out.
(2) Initial local community discovery
Community Dispersion (CD) is a key parameter. The higher the value of CD, the closer the community will be.
where represents the number of edges in the community, and represents the number of edges in the union of the community ( and the neighboring nodes set (. Our local community discovery algorithm is described in Algorithm 1.
10.26599/BDMA.2021.9020024.F003
Independent cascading model propagation process.
4.2 Probability of influence between nodes
A very important step in the process of influence maximization is to measure the influence increment of each node. Correctly measuring the influence increment of each node is crucial to achieve the maximum influence. Highly influential nodes are usually deemed important in the network. The most important statistical measures used to denote the importance of nodes are degree centrality, intermediate centrality, compact centrality, clustering centrality, etc. After computing these statistics for each node, sorting all such nodes based on the measurement statistics is a simple strategy to measure the node importance. However, using only the aforementioned statistics during the node selection process whilst solving the influence maximization problem might not yield the best outcome. This is due to the node overlapping problem, whereby two different nodes with overlapping statistics can have varied impact on the node set. Herein, selecting nodes only based on simple metrics cannot deliver the optimal solution whilst resolving the problem of maximum impact. Thus, efficiently removing the influence of these overlaps during the selection of nodes is a key problem.
Usually, many activation paths exist between nodes, thus computing each of these paths can be a tedious process. Considering the theory of six degrees of segmentation, the connection between nodes usually does not exceed a certain value. Here, it is assumed that node can activate node only through the path with the highest activation probability between nodes. Finally, the activation probabilities of each node to all other nodes in the network are combined to approximate the influence range of the nodes.
Latent Dirichlet Allocation (LDA)[
29
] model can be used to obtain the topic-based influence probability distribution among nodes[
17
]. Considering the transmission of information as an example, the calculation of the probability of are as follows:
where represents the probability that information is the topic , and represents the probability that a node influences another node under the topic.
Therefore, in the network , calculating the influence of node under information is as follows:
where represents the hub of node , and represents the probability that information is topic .
The formula for calculating the influence increment of node is defined as:
In the process of maximization of influence, the selection of the initial node is related to the direction and scope of influence of the subsequent information communication, so it is very important to find the initial communication node with the highest influence. In some existing influence maximization algorithm[
26
,
27
,
28
,
30
], the initial transmission nodes are derived from the entire social network by incrementally mining the largest node. Although the influence of these nodes is high in the entire network society, their corresponding influence for a particular topic may not be necessarily higher, and the influence of each node to calculate incremental time efficiency can be low.
In the first stage, we only statically selected nodes with large influence as influence nodes, without considering the characteristics of information transmission in social networks. Therefore, the nodes acquired in the first stage are considered as the initial transmission nodes, and the information transmission in the social network is simulated through the model. Then, the greedy strategy is adopted to iteratively mine the nodes with the largest increment of the topic influence as the remaining influence nodes. Among them, the largest increment of topic influence refers to the largest difference between the scope of influence of the current collection and the scope of influence before the addition of U. The specific algorithm flow is shown in Algorithm 2.
10.26599/BDMA.2021.9020024.F004
Node influence range under different parameters.
10.26599/BDMA.2021.9020024.F005
Running time under different parameters.
10.26599/BDMA.2021.9020024.F006
Comparison of different algorithms on influence maximization.
10.26599/BDMA.2021.9020024.F007
Comparison of the running times of different algorithms.