Scholar - SciOpen

Identifying accounts across different online social networks that belong to the same user has attracted extensive attentions. However, existing techniques rely on given user seeds and ignore the dynamic changes of online social networks, which fails to generate high quality identification results. In order to solve this problem, we propose an incremental user identification method based on user-guider similarity index (called CURIOUS), which efficiently identifies users and well captures the changes of user features over time. Specifically, we first construct a novel user-guider similarity index (called USI) to speed up the matching between users. Second we propose a two-phase user identification strategy consisting of USI-based bidirectional user matching and seed-based user matching, which is effective even for incomplete networks. Finally, we propose incremental maintenance for both USI and the identification results, which dynamically captures the instant states of social networks. We conduct experimental studies based on three real-world social networks. The experiments demonstrate the effectiveness and the efficiency of our proposed method in comparison with traditional methods. Compared with the traditional methods, our method improves precision, recall and rank score by an average of 0.19, 0.16 and 0.09 respectively, and reduces the time cost by an average of 81%.

Regular Paper Issue

Finding Communities by Decomposing and Embedding Heterogeneous Information Network

Yue Kou, De-Rong Shen, Dong Li, Tie-Zheng Nie, Ge Yu

Journal of Computer Science and Technology 2020, 35(2): 320-337

Published: 27 March 2020

Abstract Collect Collected

Community discovery is an important task in social network analysis. However, most existing methods for community discovery rely on the topological structure alone. These methods ignore the rich information available in the content data. In order to solve this issue, in this paper, we present a community discovery method based on heterogeneous information network decomposition and embedding. Unlike traditional methods, our method takes into account topology, node content and edge content, which can supply abundant evidence for community discovery. First, an embedding-based similarity evaluation method is proposed, which decomposes the heterogeneous information network into several subnetworks, and extracts their potential deep representation to evaluate the similarities between nodes. Second, a bottom-up community discovery algorithm is proposed. Via leader nodes selection, initial community generation, and community expansion, communities can be found more efficiently. Third, some incremental maintenance strategies for the changes of networks are proposed. We conduct experimental studies based on three real-world social networks. Experiments demonstrate the effectiveness and the efficiency of our proposed method. Compared with the traditional methods, our method improves normalized mutual information (NMI) and the modularity by an average of 12% and 37% respectively.

Open Access Issue

HPPQ: A Parallel Package Queries Processing Approach for Large-Scale Data

Meihui Shi, Derong Shen, Tiezheng Nie, Yue Kou, Ge Yu

Big Data Mining and Analytics 2018, 1(2): 146-159

Published: 12 April 2018

Abstract

PDF (4.4 MB) Collect Collected

Downloads：79

A lot of scholars have focused on developing effective techniques for package queries, and a lot of excellent approaches have been proposed. Unfortunately, most of the existing methods focus on a small volume of data. The rapid increase in data volume means that traditional methods of package queries find it difficult to meet the increasing requirements. To solve this problem, a novel optimization method of package queries (HPPQ) is proposed in this paper. First, the data is preprocessed into regions. Data preprocessing segments the dataset into multiple subsets and the centroid of the subsets is used for package queries, this effectively reduces the volume of candidate results. Furthermore, an efficient heuristic algorithm is proposed (namely IPOL-HS) based on the preprocessing results. This improves the quality of the candidate results in the iterative stage and improves the convergence rate of the heuristic algorithm. Finally, a strategy called HPR is proposed, which relies on a greedy algorithm and parallel processing to accelerate the rate of query. The experimental results show that our method can significantly reduce time consumption compared with existing methods.

Total 3