Open Access Issue
Lightweight Super-Resolution Model for Complete Model Copyright Protection
Tsinghua Science and Technology 2024, 29 (4): 1194-1205
Published: 09 February 2024
Abstract PDF (8.6 MB) Collect

Deep learning based techniques are broadly used in various applications, which exhibit superior performance compared to traditional methods. One of the mainstream topics in computer vision is the image super-resolution task. In recent deep learning neural networks, the number of parameters in each convolution layer has increased along with more layers and feature maps, resulting in better image super-resolution performance. In today’s era, numerous service providers offer super-resolution services to users, providing them with remarkable convenience. However, the availability of open-source super-resolution services exposes service providers to the risk of copyright infringement, as the complete model could be vulnerable to leakage. Therefore, safeguarding the copyright of the complete model is a non-trivial concern. To tackle this issue, this paper presents a lightweight model as a substitute for the original complete model in image super-resolution. This research has identified smaller networks that can deliver impressive performance, while protecting the original model’s copyright. Finally, comprehensive experiments are conducted on multiple datasets to demonstrate the superiority of the proposed approach in generating super-resolution images even using lightweight neural networks.

Open Access Issue
Few-Shot Graph Classification with Structural-Enhanced Contrastive Learning for Graph Data Copyright Protection
Tsinghua Science and Technology 2024, 29 (2): 605-616
Published: 22 September 2023
Abstract PDF (7.7 MB) Collect

Open-source licenses can promote the development of machine learning by allowing others to access, modify, and redistribute the training dataset. However, not all open-source licenses may be appropriate for data sharing, as some may not provide adequate protections for sensitive or personal information such as social network data. Additionally, some data may be subject to legal or regulatory restrictions that limit its sharing, regardless of the licensing model used. Hence, obtaining large amounts of labeled data can be difficult, time-consuming, or expensive in many real-world scenarios. Few-shot graph classification, as one application of meta-learning in supervised graph learning, aims to classify unseen graph types by only using a small amount of labeled data. However, the current graph neural network methods lack full usage of graph structures on molecular graphs and social network datasets. Since structural features are known to correlate with molecular properties in chemistry, structure information tends to be ignored with sufficient property information provided. Nevertheless, the common binary classification task of chemical compounds is unsuitable in the few-shot setting requiring novel labels. Hence, this paper focuses on the graph classification tasks of a social network, whose complex topology has an uncertain relationship with its nodes’ attributes. With two multi-class graph datasets with large node-attribute dimensions constructed to facilitate the research, we propose a novel learning framework that integrates both meta-learning and contrastive learning to enhance the utilization of graph topological information. Extensive experiments demonstrate the competitive performance of our framework respective to other state-of-the-art methods.

Open Access Issue
Call for Papers—Special Issue on Edge AI Empowered Giant Model Training
Big Data Mining and Analytics 2023, 6 (4): 526
Published: 29 August 2023
Abstract PDF (50.4 KB) Collect
Open Access Issue
Security and Privacy in Metaverse: A Comprehensive Survey
Big Data Mining and Analytics 2023, 6 (2): 234-247
Published: 26 January 2023
Abstract PDF (7.4 MB) Collect

Metaverse describes a new shape of cyberspace and has become a hot-trending word since 2021. There are many explanations about what Meterverse is and attempts to provide a formal standard or definition of Metaverse. However, these definitions could hardly reach universal acceptance. Rather than providing a formal definition of the Metaverse, we list four must-have characteristics of the Metaverse: socialization, immersive interaction, real world-building, and expandability. These characteristics not only carve the Metaverse into a novel and fantastic digital world, but also make it suffer from all security/privacy risks, such as personal information leakage, eavesdropping, unauthorized access, phishing, data injection, broken authentication, insecure design, and more. This paper first introduces the four characteristics, then the current progress and typical applications of the Metaverse are surveyed and categorized into four economic sectors. Based on the four characteristics and the findings of the current progress, the security and privacy issues in the Metaverse are investigated. We then identify and discuss more potential critical security and privacy issues that can be caused by combining the four characteristics. Lastly, the paper also raises some other concerns regarding society and humanity.

Open Access Issue
Core Decomposition and Maintenance in Bipartite Graphs
Tsinghua Science and Technology 2023, 28 (2): 292-309
Published: 29 September 2022
Abstract PDF (2 MB) Collect

The prevalence of graph data has brought a lot of attention to cohesive and dense subgraph mining. In contrast with the large number of indexes proposed to help mine dense subgraphs in general graphs, only very few indexes are proposed for the same in bipartite graphs. In this work, we present the index called α(β)-core number on vertices, which reflects the maximal cohesive and dense subgraph a vertex can be in, to help enumerate the (α,β)-cores, a commonly used dense structure in bipartite graphs. To address the problem of extremely high time and space cost for enumerating the (α,β)-cores, we first present a linear time and space algorithm for computing the α(β)-core numbers of vertices. We further propose core maintenance algorithms, to update the core numbers of vertices when a graph changes by avoiding recalculations. Experimental results on different real-world and synthetic datasets demonstrate the effectiveness and efficiency of our algorithms.

Open Access Issue
Public-private-core maintenance in public-private-graphs
Intelligent and Converged Networks 2021, 2 (4): 306-319
Published: 30 December 2021
Abstract PDF (8.9 MB) Collect

A public-private-graph (pp-graph) is developed to model social networks with hidden relationships, and it consists of one public graph in which edges are visible to all users, and multiple private graphs in which edges are only visible to its endpoint users. In contrast with conventional graphs where the edges can be visible to all users, it lacks accurate indexes to evaluate the importance of a vertex in a pp-graph. In this paper, we first propose a novel concept, public-private-core (pp-core) number based on the k-core number, which integrally considers both the public graph and private graphs of vertices, to measure how critical a user is. We then give an efficient algorithm for the pp-core number computation, which takes only linear time and space. Considering that the graphs can be always evolving over time, we also present effective algorithms for pp-core maintenance after the graph changes, avoiding redundant re-computation of pp-core number. Extension experiments conducted on real-world social networks show that our algorithms achieve good efficiency and stability. Compared to recalculating the pp-core numbers of all vertices, our maintenance algorithms can reduce the computation time by about 6–8 orders of magnitude.

Open Access Issue
Link-Privacy Preserving Graph Embedding Data Publication with Adversarial Learning
Tsinghua Science and Technology 2022, 27 (2): 244-256
Published: 29 September 2021
Abstract PDF (10.1 MB) Collect

The inefficient utilization of ubiquitous graph data with combinatorial structures necessitates graph embedding methods, aiming at learning a continuous vector space for the graph, which is amenable to be adopted in traditional machine learning algorithms in favor of vector representations. Graph embedding methods build an important bridge between social network analysis and data analytics, as social networks naturally generate an unprecedented volume of graph data continuously. Publishing social network data not only brings benefit for public health, disaster response, commercial promotion, and many other applications, but also gives birth to threats that jeopardize each individual’s privacy and security. Unfortunately, most existing works in publishing social graph embedding data only focus on preserving social graph structure with less attention paid to the privacy issues inherited from social networks. To be specific, attackers can infer the presence of a sensitive relationship between two individuals by training a predictive model with the exposed social network embedding. In this paper, we propose a novel link-privacy preserved graph embedding framework using adversarial learning, which can reduce adversary’s prediction accuracy on sensitive links, while persevering sufficient non-sensitive information, such as graph topology and node attributes in graph embedding. Extensive experiments are conducted to evaluate the proposed framework using ground truth social network datasets.

Open Access Issue
Collaborative City Digital Twin for the COVID-19 Pandemic: A Federated Learning Solution
Tsinghua Science and Technology 2021, 26 (5): 759-771
Published: 20 April 2021
Abstract PDF (12.7 MB) Collect

The novel coronavirus, COVID-19, has caused a crisis that affects all segments of the population. As the knowledge and understanding of COVID-19 evolve, an appropriate response plan for this pandemic is considered one of the most effective methods for controlling the spread of the virus. Recent studies indicate that a city Digital Twin (DT) is beneficial for tackling this health crisis, because it can construct a virtual replica to simulate factors, such as climate conditions, response policies, and people’s trajectories, to help plan efficient and inclusive decisions. However, a city DTsystem relies on long-term and high-quality data collection to make appropriate decisions, limiting its advantages when facing urgent crises, such as the COVID-19 pandemic. Federated Learning (FL), in which all clients can learn a shared model while retaining all training data locally, emerges as a promising solution for accumulating the insights from multiple data sources efficiently. Furthermore, the enhanced privacy protection settings removing the privacy barriers lie in this collaboration. In this work, we propose a framework that fused city DT with FL to achieve a novel collaborative paradigm that allows multiple city DTs to share the local strategy and status quickly. In particular, an FL central server manages the local updates of multiple collaborators (city DTs), providing a global model that is trained in multiple iterations at different city DT systems until the model gains the correlations between various response plans and infection trends. This approach means a collaborative city DT paradigm fused with FL techniques can obtain knowledge and patterns from multiple DTs and eventually establish a "global view" of city crisis management. Meanwhile, it also helps improve each city’s DT by consolidating other DT’s data without violating privacy rules. In this paper, we use the COVID-19 pandemic as the use case of the proposed framework. The experimental results on a real dataset with various response plans validate our proposed solution and demonstrate its superior performance.

Open Access Issue
Survey on Data Analysis in Social Media: A Practical Application Aspect
Big Data Mining and Analytics 2020, 3 (4): 259-279
Published: 16 November 2020
Abstract PDF (1.7 MB) Collect

Social media has more than three billion users sharing events, comments, and feelings throughout the world. It serves as a critical information source with large volumes, high velocity, and a wide variety of data. The previous studies on information spreading, relationship analyzing, and individual modeling, etc., have been heavily conducted to explore the tremendous social and commercial values of social media data. This survey studies the previous literature and the existing applications from a practical perspective. We outline a commonly used pipeline in building social media-based applications and focus on discussing available analysis techniques, such as topic analysis, time series analysis, sentiment analysis, and network analysis. After that, we present the impacts of such applications in three different areas, including disaster management, healthcare, and business. Finally, we list existing challenges and suggest promising future research directions in terms of data privacy, 5G wireless network, and multilingual support.

Open Access Issue
Fast Skyline Community Search in Multi-Valued Networks
Big Data Mining and Analytics 2020, 3 (3): 171-180
Published: 16 July 2020
Abstract PDF (1.2 MB) Collect

Community search has been extensively studied in large networks, such as Protein-Protein Interaction (PPI) networks, citation graphs, and collaboration networks. However, in terms of widely existing multi-valued networks, where each node has d ( d1) numerical attributes, almost all existing algorithms either completely ignore the attributes of node at all or only consider one attribute. To solve this problem, the concept of skyline community was presented, based on the concepts of k-core and skyline recently. The skyline community is defined as a maximal k-core that satisfies some influence constraints, which is very useful in depicting the communities that are not dominated by other communities in multi-valued networks. However, the algorithms proposed on skyline community search can only work in the special case that the nodes have different values on each attribute, and the computation complexity degrades exponentially as the number of attributes increases. In this work, we turn our attention to the general scenario where multiple nodes may have the same attribute value. Specifically, we first present an algorithm, called MICS, which can find all skyline communities in a multi-valued network. To improve computation efficiency, we then propose a dimension reduction based algorithm, called P-MICS, using the maximum entropy method. Our algorithm can significantly reduce the skyline community searching time, while is still able to find almost all cohesive skyline communities. Extensive experiments on real-world datasets demonstrate the efficiency and effectiveness of our algorithms.

Total 11