Scholar - SciOpen

Open Access Research Article Issue

Digital Twin-Enabled Edge Federated Learning for Data Streams

Zhenzhen Xie, Junjie Pang, Yan Huang, Olusesi Balogun, Dongxiao Yu, Zhipeng Cai

Tsinghua Science and Technology 2026, 31(4): 2040-2054

Published: 03 February 2026

Abstract

PDF (5.5 MB) Collect Collected

Downloads：90

With the significant advancement in the Internet of Things (IoT), Streaming Federated Learning (SFL) as a novel distributed learning approach can deal with time-varying streaming data among multiple sources. Standard SFL protocol is a collaborative training framework that enables many clients bounded with different online data sources to participate in a continuous training task. However, existing works ignore the cold-start problem and insufficient training data obstacle. Besides, due to the client heterogeneity and forgetting problem, the global model faces performance degradation during the time-series streaming data. In our work, we propose a digital twin-enabled SFL, a novel federated learning system with digital twin support to augment training data on demand. Instead of adopting an asynchronous federated learning protocol or buffer technique to wait for clients to have enough data, Generative adversarial network-based digital twins are introduced to construct a virtual replica for each federated learning client to generate a synthetic dataset based on the real data stream. We conduct the experiments using real-world datasets to evaluate the proposed SFL framework. The results under multiple data stream scenarios and various client behaviors demonstrate that our work outperforms the state-of-the-art baseline.

Open Access Issue

Lightweight Super-Resolution Model for Complete Model Copyright Protection

Bingyi Xie, Honghui Xu, YongJoon Joe, Daehee Seo, Zhipeng Cai

Tsinghua Science and Technology 2024, 29(4): 1194-1205

Published: 09 February 2024

Abstract

PDF (8.6 MB) Collect Collected

Downloads：328

Deep learning based techniques are broadly used in various applications, which exhibit superior performance compared to traditional methods. One of the mainstream topics in computer vision is the image super-resolution task. In recent deep learning neural networks, the number of parameters in each convolution layer has increased along with more layers and feature maps, resulting in better image super-resolution performance. In today’s era, numerous service providers offer super-resolution services to users, providing them with remarkable convenience. However, the availability of open-source super-resolution services exposes service providers to the risk of copyright infringement, as the complete model could be vulnerable to leakage. Therefore, safeguarding the copyright of the complete model is a non-trivial concern. To tackle this issue, this paper presents a lightweight model as a substitute for the original complete model in image super-resolution. This research has identified smaller networks that can deliver impressive performance, while protecting the original model’s copyright. Finally, comprehensive experiments are conducted on multiple datasets to demonstrate the superiority of the proposed approach in generating super-resolution images even using lightweight neural networks.

Open Access Issue

Few-Shot Graph Classification with Structural-Enhanced Contrastive Learning for Graph Data Copyright Protection

Kainan Zhang, DongMyung Shin, Daehee Seo, Zhipeng Cai

Tsinghua Science and Technology 2024, 29(2): 605-616

Published: 22 September 2023

Abstract

PDF (7.7 MB) Collect Collected

Downloads：141

Open-source licenses can promote the development of machine learning by allowing others to access, modify, and redistribute the training dataset. However, not all open-source licenses may be appropriate for data sharing, as some may not provide adequate protections for sensitive or personal information such as social network data. Additionally, some data may be subject to legal or regulatory restrictions that limit its sharing, regardless of the licensing model used. Hence, obtaining large amounts of labeled data can be difficult, time-consuming, or expensive in many real-world scenarios. Few-shot graph classification, as one application of meta-learning in supervised graph learning, aims to classify unseen graph types by only using a small amount of labeled data. However, the current graph neural network methods lack full usage of graph structures on molecular graphs and social network datasets. Since structural features are known to correlate with molecular properties in chemistry, structure information tends to be ignored with sufficient property information provided. Nevertheless, the common binary classification task of chemical compounds is unsuitable in the few-shot setting requiring novel labels. Hence, this paper focuses on the graph classification tasks of a social network, whose complex topology has an uncertain relationship with its nodes’ attributes. With two multi-class graph datasets with large node-attribute dimensions constructed to facilitate the research, we propose a novel learning framework that integrates both meta-learning and contrastive learning to enhance the utilization of graph topological information. Extensive experiments demonstrate the competitive performance of our framework respective to other state-of-the-art methods.

Open Access Issue

Call for Papers—Special Issue on Edge AI Empowered Giant Model Training

Dongxiao Yu, Xu Chen, Zhipeng Cai

Big Data Mining and Analytics 2023, 6(4): 526

Published: 29 August 2023

Abstract

PDF (50.4 KB) Collect Collected

Downloads：119

Open Access Issue

Security and Privacy in Metaverse: A Comprehensive Survey

Yan Huang, Yi (Joy) Li, Zhipeng Cai

Big Data Mining and Analytics 2023, 6(2): 234-247

Published: 26 January 2023

Abstract

PDF (7.4 MB) Collect Collected

Downloads：1234

Metaverse describes a new shape of cyberspace and has become a hot-trending word since 2021. There are many explanations about what Meterverse is and attempts to provide a formal standard or definition of Metaverse. However, these definitions could hardly reach universal acceptance. Rather than providing a formal definition of the Metaverse, we list four must-have characteristics of the Metaverse: socialization, immersive interaction, real world-building, and expandability. These characteristics not only carve the Metaverse into a novel and fantastic digital world, but also make it suffer from all security/privacy risks, such as personal information leakage, eavesdropping, unauthorized access, phishing, data injection, broken authentication, insecure design, and more. This paper first introduces the four characteristics, then the current progress and typical applications of the Metaverse are surveyed and categorized into four economic sectors. Based on the four characteristics and the findings of the current progress, the security and privacy issues in the Metaverse are investigated. We then identify and discuss more potential critical security and privacy issues that can be caused by combining the four characteristics. Lastly, the paper also raises some other concerns regarding society and humanity.

Open Access Issue

Core Decomposition and Maintenance in Bipartite Graphs

Dongxiao Yu, Lifang Zhang, Qi Luo, Xiuzhen Cheng, Zhipeng Cai

Tsinghua Science and Technology 2023, 28(2): 292-309

Published: 29 September 2022

Abstract

PDF (2 MB) Collect Collected

Downloads：123

The prevalence of graph data has brought a lot of attention to cohesive and dense subgraph mining. In contrast with the large number of indexes proposed to help mine dense subgraphs in general graphs, only very few indexes are proposed for the same in bipartite graphs. In this work, we present the index called $α (β)$ -core number on vertices, which reflects the maximal cohesive and dense subgraph a vertex can be in, to help enumerate the $(α, β)$ -cores, a commonly used dense structure in bipartite graphs. To address the problem of extremely high time and space cost for enumerating the $(α, β)$ -cores, we first present a linear time and space algorithm for computing the $α (β)$ -core numbers of vertices. We further propose core maintenance algorithms, to update the core numbers of vertices when a graph changes by avoiding recalculations. Experimental results on different real-world and synthetic datasets demonstrate the effectiveness and efficiency of our algorithms.

Open Access Issue

Public-private-core maintenance in public-private-graphs

Dongxiao Yu, Xilian Zhang, Qi Luo, Lifang Zhang, Zhenzhen Xie, Zhipeng Cai

Intelligent and Converged Networks 2021, 2(4): 306-319

Published: 30 December 2021

Abstract

PDF (8.9 MB) Collect Collected

Downloads：99

A public-private-graph (pp-graph) is developed to model social networks with hidden relationships, and it consists of one public graph in which edges are visible to all users, and multiple private graphs in which edges are only visible to its endpoint users. In contrast with conventional graphs where the edges can be visible to all users, it lacks accurate indexes to evaluate the importance of a vertex in a pp-graph. In this paper, we first propose a novel concept, public-private-core (pp-core) number based on the k-core number, which integrally considers both the public graph and private graphs of vertices, to measure how critical a user is. We then give an efficient algorithm for the pp-core number computation, which takes only linear time and space. Considering that the graphs can be always evolving over time, we also present effective algorithms for pp-core maintenance after the graph changes, avoiding redundant re-computation of pp-core number. Extension experiments conducted on real-world social networks show that our algorithms achieve good efficiency and stability. Compared to recalculating the pp-core numbers of all vertices, our maintenance algorithms can reduce the computation time by about 6–8 orders of magnitude.

Open Access Issue

Link-Privacy Preserving Graph Embedding Data Publication with Adversarial Learning

Kainan Zhang, Zhi Tian, Zhipeng Cai, Daehee Seo

Tsinghua Science and Technology 2022, 27(2): 244-256

Published: 29 September 2021

Abstract

PDF (10.1 MB) Collect Collected

Downloads：173

The inefficient utilization of ubiquitous graph data with combinatorial structures necessitates graph embedding methods, aiming at learning a continuous vector space for the graph, which is amenable to be adopted in traditional machine learning algorithms in favor of vector representations. Graph embedding methods build an important bridge between social network analysis and data analytics, as social networks naturally generate an unprecedented volume of graph data continuously. Publishing social network data not only brings benefit for public health, disaster response, commercial promotion, and many other applications, but also gives birth to threats that jeopardize each individual’s privacy and security. Unfortunately, most existing works in publishing social graph embedding data only focus on preserving social graph structure with less attention paid to the privacy issues inherited from social networks. To be specific, attackers can infer the presence of a sensitive relationship between two individuals by training a predictive model with the exposed social network embedding. In this paper, we propose a novel link-privacy preserved graph embedding framework using adversarial learning, which can reduce adversary’s prediction accuracy on sensitive links, while persevering sufficient non-sensitive information, such as graph topology and node attributes in graph embedding. Extensive experiments are conducted to evaluate the proposed framework using ground truth social network datasets.

Open Access Issue

Collaborative City Digital Twin for the COVID-19 Pandemic: A Federated Learning Solution

Junjie Pang, Yan Huang, Zhenzhen Xie, Jianbo Li, Zhipeng Cai

Tsinghua Science and Technology 2021, 26(5): 759-771

Published: 20 April 2021

Abstract

PDF (12.7 MB) Collect Collected

Downloads：166

The novel coronavirus, COVID-19, has caused a crisis that affects all segments of the population. As the knowledge and understanding of COVID-19 evolve, an appropriate response plan for this pandemic is considered one of the most effective methods for controlling the spread of the virus. Recent studies indicate that a city Digital Twin (DT) is beneficial for tackling this health crisis, because it can construct a virtual replica to simulate factors, such as climate conditions, response policies, and people’s trajectories, to help plan efficient and inclusive decisions. However, a city DTsystem relies on long-term and high-quality data collection to make appropriate decisions, limiting its advantages when facing urgent crises, such as the COVID-19 pandemic. Federated Learning (FL), in which all clients can learn a shared model while retaining all training data locally, emerges as a promising solution for accumulating the insights from multiple data sources efficiently. Furthermore, the enhanced privacy protection settings removing the privacy barriers lie in this collaboration. In this work, we propose a framework that fused city DT with FL to achieve a novel collaborative paradigm that allows multiple city DTs to share the local strategy and status quickly. In particular, an FL central server manages the local updates of multiple collaborators (city DTs), providing a global model that is trained in multiple iterations at different city DT systems until the model gains the correlations between various response plans and infection trends. This approach means a collaborative city DT paradigm fused with FL techniques can obtain knowledge and patterns from multiple DTs and eventually establish a "global view" of city crisis management. Meanwhile, it also helps improve each city’s DT by consolidating other DT’s data without violating privacy rules. In this paper, we use the COVID-19 pandemic as the use case of the proposed framework. The experimental results on a real dataset with various response plans validate our proposed solution and demonstrate its superior performance.

Open Access Issue

Survey on Data Analysis in Social Media: A Practical Application Aspect

Qixuan Hou, Meng Han, Zhipeng Cai

Big Data Mining and Analytics 2020, 3(4): 259-279

Published: 16 November 2020

Abstract

PDF (1.7 MB) Collect Collected

Downloads：199

Social media has more than three billion users sharing events, comments, and feelings throughout the world. It serves as a critical information source with large volumes, high velocity, and a wide variety of data. The previous studies on information spreading, relationship analyzing, and individual modeling, etc., have been heavily conducted to explore the tremendous social and commercial values of social media data. This survey studies the previous literature and the existing applications from a practical perspective. We outline a commonly used pipeline in building social media-based applications and focus on discussing available analysis techniques, such as topic analysis, time series analysis, sentiment analysis, and network analysis. After that, we present the impacts of such applications in three different areas, including disaster management, healthcare, and business. Finally, we list existing challenges and suggest promising future research directions in terms of data privacy, 5G wireless network, and multilingual support.