A Novel Influence Maximization Algorithm for a Competitive Environment Based on Social Media Data Analytics

Jie Tong; Leilei Shi; Lu Liu; John Panneerselvam; Zixuan Han

doi:10.26599/BDMA.2021.9020024

| Sign up

PDF (5.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Figures (7)

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Tables (1)

Table 1

Open Access

A Novel Influence Maximization Algorithm for a Competitive Environment Based on Social Media Data Analytics

Jie Tong, Leilei Shi(), Lu Liu(), John Panneerselvam, Zixuan Han

School of Computer Science and Telecommunication Engineering and Jiangsu Key Laboratory of Security Technology for Industrial Cyberspace, Jiangsu University, Zhenjiang 212013, China

School of Computing and Mathematical Sciences, University of Leicester, Leicester LE1 7RH, UK

Show Author Information

Abstract

Online social networks are increasingly connecting people around the world. Influence maximization is a key area of research in online social networks, which identifies influential users during information dissemination. Most of the existing influence maximization methods only consider the transmission of a single channel, but real-world networks mostly include multiple channels of information transmission with competitive relationships. The problem of influence maximization in an environment involves selecting the seed node set for certain competitive information, so that it can avoid the influence of other information, and ultimately affect the largest set of nodes in the network. In this paper, the influence calculation of nodes is achieved according to the local community discovery algorithm, which is based on community dispersion and the characteristics of dynamic community structure. Furthermore, considering two various competitive information dissemination cases as an example, a solution is designed for self-interested information based on the assumption that the seed node set of competitive information is known, and a novel influence maximization algorithm of node avoidance based on user interest is proposed. Experiments conducted based on real-world Twitter dataset demonstrates the efficiency of our proposed algorithm in terms of accuracy and time against notable influence maximization algorithms.

Keywords

influence maximization competitive environment dynamic network

References

[1]

L. L.

Shi

, Y.

, L.

Liu

, X.

Sun

, and L.

Jiang

, Event detection and identification of influential spreaders in social media data streams, Big Data Mining and Analytics, vol. 1, no. 1, pp. 34-46, 2018.

Crossref Google Scholar

[2]

L. L.

Shi

, L.

Liu

, Y.

, L.

Jiang

, M.

Kazim

, H.

Ali

, and J.

Panneerselvam

, Human-centric cyber social computing model for hot-event detection and propagation, IEEE Trans. Comput. Soc. Syst., vol. 6, no. 5, pp. 1042-1050, 2019.

Crossref Google Scholar

[3]

, S. X.

Liu

, H.

Wei

, and J. J.

, Multi-kernel fuzzy clustering based on auto-encoder for fMRI functional network, Expert Syst. Appl., vol. 159, p. 113513, 2020.

Crossref Google Scholar

[4]

Monney

, Y. Z.

Zhan

, Z.

Jiang

, and B. B.

Benuwa

, A multi-kernel method of measuring adaptive similarity for spectral clustering, Expert Syst. Appl., vol. 159, p. 113570, 2020.

Crossref Google Scholar

[5]

S. N.

Tang

, S. Q.

Yuan

, and Y.

Zhu

, Data preprocessing techniques in convolutional neural network based on fault diagnosis towards rotating machinery, IEEE Access, vol. 8, pp. 149487-149496, 2020.

Crossref Google Scholar

[6]

L. L.

Shi

, L.

Liu

, Y.

, L.

Jiang

, J.

Panneerselvam

, and R.

Crole

, A social sensing model for event detection and user influence discovering in social media data streams, IEEE Trans. Comput. Soc. Syst., vol. 7, no. 1, pp. 141-150, 2020.

Crossref Google Scholar

[7]

Palla

, I.

Derényi

, I.

Farkas

, and T.

Vicsek

, Uncovering the overlapping community structure of complex networks in nature and society, Nature, vol. 435, no. 7043, pp. 814-818, 2005.

Crossref Google Scholar

[8]

Gregory

, Finding overlapping communities in networks by label propagation, New J. Phys., vol. 12, p. 103018, 2010.

Crossref Google Scholar

[9]

S. Y.

Shih

, M.

Lee

, and C. C.

Chen

, An effective friend recommendation method using learning to rank and social influence, in Proc. PACIS 2015, Singapore, 2015, pp. 242-250.

Google Scholar

[10]

Kim

and S.

Kim

, Detecting overlapping and hierarchical communities in complex network using interaction-based edge clustering, Phys. A Stat. Mech. Appl., vol. 417, pp. 46-56, 2015.

Crossref Google Scholar

[11]

, L. L.

Shi

, Y.

, and J.

Liu

, Human-driven dynamic community influence maximization in social media data streams, IEEE Access, vol. 8, pp. 162238-162251, 2020.

Crossref Google Scholar

[12]

Domingos

and M.

Richardson

, Mining the network value of customers, in Proc. 7^th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2001, pp. 57-66.

Crossref Google Scholar

[13]

Kempe

, J.

Kleinberg

, and É.

Tardos

, Maximizing the spread of influence through a social network, in Proc. 9^th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Washington, DC, USA, 2003, pp. 137-146.

Crossref Google Scholar

[14]

Wang

, W.

Chen

, and Y. J.

Wang

, Scalable influence maximization for independent cascade model in large-scale social networks, Data Min. Knowl. Discov., vol. 25, no. 3, pp. 545-576, 2012.

Crossref Google Scholar

[15]

J. T.

Tian

, Y. T.

Wang

, X. J.

Feng

, A new hybrid algorithm for influence maximization in social networks, (in Chinese), Chin. J. Comput., vol. 10, no. 3, pp. 1956-1965, 2011.

Crossref Google Scholar

[16]

Beutel

, B. A.

Prakash

, R.

Rosenfeld

, and C.

Faloutsos

, Interacting viruses in networks: Can both survive? in Proc. 18^th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Beijing, China, 2012, pp. 426-434.

Crossref Google Scholar

[17]

J. M.

Chen

, L. L.

Shi

, L.

Liu

, A. O.

Ayorinde

, R. B.

Zhu

, and J.

Panneerselvam

, User interest communities influence maximization in a competitive environment, in Proc. 2020 16^th Int. Conf. on Mobility, Sensing and Networking, Tokyo, Japan, 2020, pp. 614-621.

Crossref Google Scholar

[18]

Bharathi

, D.

Kempe

, and M.

Salek

, Competitive influence maximization in social networks, in Proc. 3^rd Int. Workshop on Web and Internet Economics, Berlin, Germany, 2007, pp. 306-311.

Crossref Google Scholar

[19]

Bozorgi

, S.

Samet

, J.

Kwisthout

, and T.

Wareham

, Community-based influence maximization in social networks under a competitive linear threshold model, Knowl. Based Syst., vol. 134, pp. 149-158, 2017.

Crossref Google Scholar

[20]

B. W.

Kernighan

and S.

Lin

, An efficient heuristic procedure for partitioning graphs, Bell Syst. Tech. J., vol. 49, no. 2, pp. 291-307, 2014.

Crossref Google Scholar

[21]

Y. C.

Wei

and C. K.

Cheng

, Ratio cut partitioning for hierarchical designs, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., vol. 10, pp. 911-921, 1991.

Crossref Google Scholar

[22]

Jiang

, L. L.

Shi

, L.

Liu

, J. J.

Yao

, B.

Yuan

, and Y. J.

Zheng

, An efficient evolutionary user interest community discovery model in dynamic social networks for internet of people, IEEE Internet Things J., vol. 6, no. 6, pp. 9226-9236, 2019.

Crossref Google Scholar

[23]

S. C.

Liu

, F. X.

Zhu

, and L.

Gan

, A label-propagation-probability-based algorithm for overlapping community detection, (in Chinese), Chin. J. Comput., vol. 39, no. 4, pp. 717-729, 2016.

Google Scholar

[24]

Zhu

, D.

, and Z.

Zhang

, Minimum cost seed set for competitive social influence, in Proc. of IEEE International Conference on Computer Communications, San Francisco, CA, USA, .

Crossref Google Scholar

[25]

Ribeiro

and C.

Faloutsos

, Modeling website popularity competition in the attention-activity marketplace, in Proc. 8^th ACM Int. Conf. on Web Search and Data Mining, Shanghai, China, 2015, pp. 389-398.

Crossref Google Scholar

[26]

Beutel

, B. A.

Prakash

, R.

Rosenfeld

, and C.

Faloutsos

, Interacting viruses in networks: Can both survive? in Proc. 18^th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Beijing, China, 2012, pp. 426-434.

Crossref Google Scholar

[27]

C. W.

Tsai

, Y. C.

Yang

, and M. C.

Chiang

, A genetic NewGreedy algorithm for influence maximization in social network, in Proc. 2015 IEEE Int. Conf. on Systems, Man, and Cybernetics, Hong Kong, China, 2016, pp. 2549-2554.

Crossref Google Scholar

[28]

L. L.

Shi

, L.

Liu

, Y.

, L.

Jiang

, and J.

Hardy

, Event detection and user interest discovering in social media data streams, IEEE Access, vol. 5, pp. 20953-20964, 2017.

Crossref Google Scholar

[29]

D. M.

Blei

, A. Y.

, and M. I.

Jordan

, Latent dirichlet allocation, J. Mach. Learn. Res., vol. 3, pp. 993-1022, 2003.

Google Scholar

[30]

Trivedi

and A.

Singh

, Efficient influence maximization in social-networks under independent cascade model, Proced. Comput. Sci., vol. 173, pp. 315-324, 2020.

Crossref Google Scholar

[31]

A. N.

Langville

and C. D. J. T. M. I.

Meyer

, Google’s pagerank and beyond, The Science of Search Engine Rankings, vol. 30, no. 1, pp. 68-69, 2011.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 5 Issue 2,
June 2022

Pages 130-139

DOI: 10.26599/BDMA.2021.9020024

Cite this article:

Tong J, Shi L, Liu L, et al. A Novel Influence Maximization Algorithm for a Competitive Environment Based on Social Media Data Analytics. Big Data Mining and Analytics, 2022, 5(2): 130-139. https://doi.org/10.26599/BDMA.2021.9020024

10.26599/BDMA.2021.9020024.F001 Fig. 1Nodes and edges in social network.

10.26599/BDMA.2021.9020024.F002 Fig. 2Schematic diagram of the iterative process of the relationship evaluation algorithm.

3 Local Community Discovery Algorithm Based on Community Dispersion

As we can see from Fig. 1 , Social networks often include a large number of low-quality and invalid data and nodes. Accurately mapping all these nodes in the network generates a lot of computation overheads and further reduces the efficiency. Therefore, we propose to use the Hyperlink-Induced Topic Search (HITS) based relationship evaluation algorithm to filter out high authorities and hubs.

(1) Data preprocessing based on HITS

The algorithm based on HITS iteratively calculates the centrality of each user and the authority of each post, to obtain high authorities and hubs. Figure 2 shows the link relationship between users and posts and the iteration process. The process is described as follows:

10.26599/BDMA.2021.9020024.F002 Fig. 2Schematic diagram of the iterative process of the relationship evaluation algorithm.

(1) Initialize users’ centrality and post authority;

(2) The centrality of the users is the sum of the authority of all the linked posts, and the authority of the post is the sum of the centrality of all the linked users. Calculate users’ centrality and post authority in the follwing:

(1) $hub = \sum authority$

(2) $authority = \sum hub$

(3) The users’ centrality and post authority are normalized,

(3) $hub = \frac{hub}{\sum hub}$

(4) $authority = \frac{authority}{\sum authority}$

(4) Evaluate whether the users’ centrality degree and the post authority degree converge, the convergence will output the result; otherwise continue with Eqs. ( 2 )-( 4 ),

(5) $θ a = \frac{\sum_{i = 0}^{n} authority}{n}$

(6) $θ_{h} = \frac{\sum_{i = 0}^{n} hub}{n}$

where $θ_{a}$ and $θ_{h}$ denote the thresholds after normalization of the posts’ authority and users’ hub, respectively.

Finally, the users’ centrality and post authority are sorted in a descending fashion (high to low) according to the numerical results, and the parameters $θ_{a}$ and $θ_{h}$ are defined to represent the critical value of screening, which is used to extract users’ interest labels, and other information are filtered out.

(2) Initial local community discovery

Community Dispersion (CD) is a key parameter. The higher the value of CD, the closer the community will be.

(7) $CD = \frac{e_{i}}{e_{o}}$

where $e_{i}$ represents the number of edges in the community, and $e_{o}$ represents the number of edges in the union of the community ( $C)$ and the neighboring nodes set ( $N)$ . Our local community discovery algorithm is described in Algorithm 1.

10.26599/BDMA.2021.9020024.F003 Fig. 3Independent cascading model propagation process.

4.2 Probability of influence between nodes

A very important step in the process of influence maximization is to measure the influence increment of each node. Correctly measuring the influence increment of each node is crucial to achieve the maximum influence. Highly influential nodes are usually deemed important in the network. The most important statistical measures used to denote the importance of nodes are degree centrality, intermediate centrality, compact centrality, clustering centrality, etc. After computing these statistics for each node, sorting all such nodes based on the measurement statistics is a simple strategy to measure the node importance. However, using only the aforementioned statistics during the node selection process whilst solving the influence maximization problem might not yield the best outcome. This is due to the node overlapping problem, whereby two different nodes with overlapping statistics can have varied impact on the node set. Herein, selecting nodes only based on simple metrics cannot deliver the optimal solution whilst resolving the problem of maximum impact. Thus, efficiently removing the influence of these overlaps during the selection of nodes is a key problem.

Usually, many activation paths exist between nodes, thus computing each of these paths can be a tedious process. Considering the theory of six degrees of segmentation, the connection between nodes usually does not exceed a certain value. Here, it is assumed that node $u$ can activate node $v$ only through the path with the highest activation probability between nodes. Finally, the activation probabilities of each node to all other nodes in the network are combined to approximate the influence range of the nodes.

Latent Dirichlet Allocation (LDA)^{[

29
]} model can be used to obtain the topic-based influence probability distribution among nodes^{[

17
]}. Considering the transmission of information $B$ as an example, the calculation of the probability of $p_{u v}^{B}$ are as follows:

(9) $p_{u v}^{B} = \sum_{j = 1}^{k} p (λ_{B} | z_{j}) p (u, v | z_{j})$

where $p (λ_{B} | z_{j})$ represents the probability that information $B$ is the topic $z_{j}$ , and $p (u, v | z_{j})$ represents the probability that a node influences another node under the topic.

Therefore, in the network $G (V, E)$ , calculating the influence of node $v$ under information $B$ is as follows:

(10) ${Inf}^{B} (v) = \sum_{i = 1}^{k} hub (v) \cdot p (λ_{B} | z_{i})$

where $hub (v)$ represents the hub of node $v$ , and $p (λ_{B} | z_{i})$ represents the probability that information $B$ is topic $z_{i}$ .

The formula for calculating the influence increment of node $V$ is defined as:

(11) $Δ {Inf}^{B} (v) = δ (S \cup {v}) - δ (S)$

In the process of maximization of influence, the selection of the initial node is related to the direction and scope of influence of the subsequent information communication, so it is very important to find the initial communication node with the highest influence. In some existing influence maximization algorithm^{[

26
,

27
,

28
,

30
]}, the initial transmission nodes are derived from the entire social network by incrementally mining the largest node. Although the influence of these nodes is high in the entire network society, their corresponding influence for a particular topic may not be necessarily higher, and the influence of each node to calculate incremental time efficiency can be low.

In the first stage, we only statically selected $⌈ β \times k ⌉$ nodes with large influence as influence nodes, without considering the characteristics of information transmission in social networks. Therefore, the nodes acquired in the first stage are considered as the initial transmission nodes, and the information transmission in the social network is simulated through the model. Then, the greedy strategy is adopted to iteratively mine the $k - ⌈ β \times k ⌉$ nodes with the largest increment of the topic influence as the remaining influence nodes. Among them, the largest increment of topic influence refers to the largest difference between the scope of influence of the current collection and the scope of influence before the addition of U. The specific algorithm flow is shown in Algorithm 2.

10.26599/BDMA.2021.9020024.F004 Fig. 4Node influence range under different parameters.

10.26599/BDMA.2021.9020024.F005 Fig. 5Running time under different parameters.

10.26599/BDMA.2021.9020024.F006 Fig. 6Comparison of different algorithms on influence maximization.

10.26599/BDMA.2021.9020024.F007 Fig. 7Comparison of the running times of different algorithms.

手机微信扫描二维码，点击右上角···按钮
分享到微信朋友圈

Algorithm	Description
CELF	An algorithm that eliminates nodes in a social network and causes diminishing marginal benefits
Random	An algorithm for removing edges in a random way from a network that is not successfully activated
MixGreedy	The greedy algorithm of CELF algorithm that is performed after the first round of NewGreedy algorithm
PageRank	An algorithm used by Google to identify the rank/importance of web pages
Degree	A heuristic algorithm for selecting $k$ nodes with maximum degree
IMNA	An algorithm for selecting seed nodes by node avoidance in a competitive environment