Scholar - SciOpen

Open Access Research Article Issue

Exploring Hierarchical Tuple-Based Contextual Correlations for Human-Object Interaction Detection

Xin Hu, Ke Qin, Tao He, Guangchun Luo

Tsinghua Science and Technology 2026, 31(6): 2722-2737

Published: 25 June 2026

Abstract

PDF (3.5 MB) Collect Collected

Downloads：22

Human-Object Interaction (HOI) detection is a challenging task in computer vision, particularly in complex scenes involving multiple humans and interactions. In this paper, we propose the Hierarchical Tuple-based Contextual Correlations Learning (HTCCL) model, which aims to enhance HOI detection by systematically capturing multi-level contextual relationships. Our approach decomposes an interaction into three hierarchical levels: Entity, action, and event. We introduce a heterogeneous graph network with a multi-branch Transformer architecture, where human and object entities are treated as distinct nodes, facilitating fine-grained relational reasoning. Furthermore, we leverage Contrastive Language-Image Pre-training model to embed interaction cues into queries, which are subsequently refined through local and global contextual aggregation modules. The proposed model effectively integrates contextual information across various levels, improving its ability to detect complex interactions within diverse scenes. Our extensive evaluations on standard benchmarks demonstrate the superiority of HTCCL in achieving state-of-the-art performance in HOI detection, particularly in scenarios with high relational complexity.

Open Access Research Article Issue

Tackling Long-Tail Video Recognition via Hierarchical Memory Banks

Shutian Zhou, Ruolan Fu, Zhixuan Zhou, Ke Qin, Guangchun Luo

Tsinghua Science and Technology 2026, 31(2): 892-903

Published: 21 October 2025

Abstract

PDF (9.1 MB) Collect Collected

Downloads：69

In the real world, long-tailed data distributions are common and natural. This paper focuses on the long-tailed problem in video recognition, which consists of two aspects. First, the inter-video long-tailed distribution affects video samples. The tail video classes have fewer samples and lack within-class diversity, leading to weakened recognition accuracy. Second, the intra-video long-tailed distribution involves background frames that degrade the video representation by dominating the majority of frames unrelated to the video theme. To address these challenges, this paper proposes the long-short memory bank. This approach involves building two feature banks for each video class: the long-term bank and the short-term bank. The long-term bank uses a dictionary to store current and past video-level representations, enhancing the competitiveness of tail classes and mitigating the impact of insufficient samples on video classifier training. The short-term bank stores the most discriminative frame-level information, weakening background information and improving the robustness of video representation. During training, the current batch features are combined with the memory bank features to promote intra-class compactness and inter-class discrepancy. Experimental results on the VideoLT dataset demonstrate that our proposed Long-short Memory Bank improves recognition accuracy for tail video classes by 6.4%, without sacrificing overall recognition accuracy.

Open Access Issue

Enhancing Temporal Knowledge Graph for Future Event Prediction with Long-Term Dense Graph

Bin Chen, Jin Wu, Xin Liu, Fan Zhou, Guangchun Luo

Tsinghua Science and Technology 2026, 31(1): 621-638

Published: 25 August 2025

Abstract

PDF (5.5 MB) Collect Collected

Downloads：243

Temporal knowledge graph (TKG) reasoning has emerged as a pivotal approach in event prediction. An important yet challenging task in TKG reasoning is to predict future events by extrapolating from historical events and their correlations. Existing methods either overlook the modeling of long-term dependencies between entities or are ineffective in aggregating long-term information with recent facts. Motivated by dual process theory in cognitive sciences, we introduce TKG-LDG, an approach enhancing TKG for future entity prediction with long-term dense graph, to model event evolution in an adaptive manner. We first construct a unified dense graph from historical data to capture long-term dependencies, reflecting cumulative knowledge of entity interactions over time. This unified dense graph is compatible with any graph neural network and facilitates entity interaction learning from a long-term perspective. Then we initialize a TKG encoder from the unified dense graph to enhance short-term event interaction modeling. TKG-LDG effectively marries global context with local adaptability to recent temporal changes through its short-term recurrent encoders, in a way that mirrors human reasoning by integrating both long-term and short-term event dynamics. Extensive experiments conducted on six widely used TKG datasets demonstrate that our model outperforms strong baselines in future event prediction.

Open Access Issue

Seeing and Reasoning: A Simple Deep Learning Approach to Visual Question Answering

Rufai Yusuf Zakari, Jim Wilson Owusu, Ke Qin, Tao He, Guangchun Luo

Big Data Mining and Analytics 2025, 8(2): 458-478

Published: 28 January 2025

Abstract

PDF (3.7 MB) Collect Collected

Downloads：180

Visual Question Answering (VQA) is a complex task that requires a deep understanding of both visual content and natural language questions. The challenge lies in enabling models to recognize and interpret visual elements and to reason through questions in a multi-step, compositional manner. We propose a novel Transformer-based model that introduces specialized tokenization techniques to effectively capture intricate relationships between visual and textual features. The model employs an enhanced self-attention mechanism, enabling it to attend to multiple modalities simultaneously, while a co-attention unit dynamically guides focus to the most relevant image regions and question components. Additionally, a multi-step reasoning module supports iterative inference, allowing the model to excel at complex reasoning tasks. Extensive experiments on benchmark datasets demonstrate the model’s superior performance, with accuracies of 98.6% on CLEVR, 63.78% on GQA, and 68.67% on VQA v2.0. Ablation studies confirm the critical contribution of key components, such as the reasoning module and co-attention mechanism, to the model’s effectiveness. Qualitative analysis of the learned attention distributions further illustrates the model’s dynamic reasoning process, adapting to task complexity. Overall, our study advances the adaptation of Transformer architectures for VQA, enhancing both reasoning capabilities and model interpretability in visual reasoning tasks.

Open Access Issue

Exploring the Chameleon Effect of Contextual Dynamics in Temporal Knowledge Graph for Event Prediction

Xin Liu, Yi He, Wenxin Tai, Xovee Xu, Fan Zhou, Guangchun Luo

Tsinghua Science and Technology 2025, 30(1): 433-455

Published: 11 September 2024

Abstract

PDF (1.4 MB) Collect Collected

Downloads：91

The ability to forecast future events brings great benefits for society and cyberspace in many public safety domains, such as civil unrest, pandemics and crimes. The occurrences of new events are often correlated or dependent on historical and concurrent events. Many existing studies learn event-occurring processes with sequential and structural models, which, however, suffer from inefficient and inaccurate prediction problems. To better understand the event forecasting task and characterize the occurrence of new events, we exploit the human cognitive theory from the cognitive neuroscience discipline to find available cues for algorithm design and event prediction. Motivated by the dual process theory, we propose a two-stage learning scheme for event knowledge mining and prediction. First, we screen out event candidates based on historical inherent knowledge. Then we re-rank event candidates by probing into the newest relative events. Our proposed model mimics a sociological phenomenon called “the chameleon effect” and consists of a new target attentive graph collaborative learning mechanism to ensure a better understanding of sophisticated evolution patterns associated with events. In addition, self-supervised contrastive learning is employed to alleviate the over-smoothing problem that existed in graph learning while improving the model’s interpretability. Experiments show the effectiveness of our approach.

Open Access Issue

Inductive Relation Prediction by Disentangled Subgraph Structure

Guiduo Duan, Rui Guo, Wenlong Luo, Guangchun Luo, Tianxi Huang

Tsinghua Science and Technology 2024, 29(5): 1566-1579

Published: 02 May 2024

Abstract

PDF (3.5 MB) Collect Collected

Downloads：65

Currently, most existing inductive relation prediction approaches are based on subgraph structures, with subgraph features extracted using graph neural networks to predict relations. However, subgraphs may contain disconnected regions, which usually represent different semantic ranges. Because not all semantic information about the regions is helpful in relation prediction, we propose a relation prediction model based on a disentangled subgraph structure and implement a feature updating approach based on relevant semantic aggregation. To indirectly achieve the disentangled subgraph structure from a semantic perspective, the mapping of entity features into different semantic spaces and the aggregation of related semantics on each semantic space are updated. The disentangled model can focus on features having higher semantic relevance in the prediction, thus addressing a problem with existing approaches, which ignore the semantic differences in different subgraph structures. Furthermore, using a gated recurrent neural network, this model enhances the features of entities by sorting them by distance and extracting the path information in the subgraphs. Experimentally, it is shown that when there are numerous disconnected regions in the subgraph, our model outperforms existing mainstream models in terms of both Area Under the Curve-Precision-Recall (AUC-PR) and Hits@10. Experiments prove that semantic differences in the knowledge graph can be effectively distinguished and verify the effectiveness of this method.

Open Access Issue

Denoising Graph Inference Network for Document-Level Relation Extraction

Hailin Wang, Ke Qin, Guiduo Duan, Guangchun Luo

Big Data Mining and Analytics 2023, 6(2): 248-262

Published: 26 January 2023

Abstract

PDF (4.8 MB) Collect Collected

Downloads：160

Relation Extraction (RE) is to obtain a predefined relation type of two entities mentioned in a piece of text, e.g., a sentence-level or a document-level text. Most existing studies suffer from the noise in the text, and necessary pruning is of great importance. The conventional sentence-level RE task addresses this issue by a denoising method using the shortest dependency path to build a long-range semantic dependency between entity pairs. However, this kind of denoising method is scarce in document-level RE. In this work, we explicitly model a denoised document-level graph based on linguistic knowledge to capture various long-range semantic dependencies among entities. We first formalize a Syntactic Dependency Tree forest (SDT-forest) by introducing the syntax and discourse dependency relation. Then, the Steiner tree algorithm extracts a mention-level denoised graph, Steiner Graph (SG), removing linguistically irrelevant words from the SDT-forest. We then devise a slide residual attention to highlight word-level evidence on text and SG. Finally, the classification is established on the SG to infer the relations of entity pairs. We conduct extensive experiments on three public datasets. The results evidence that our method is beneficial to establish long-range semantic dependency and can improve the classification performance with longer texts.

Open Access Issue

Disseminating Authorized Content via Data Analysis in Opportunistic Social Networks

Chenguang Kong, Guangchun Luo, Ling Tian, Xiaojun Cao

Big Data Mining and Analytics 2019, 2(1): 12-24

Published: 15 October 2018

Abstract

PDF (1.2 MB) Collect Collected

Downloads：93

Authorized content is a type of content that can be generated only by a certain Content Provider (CP). The content copies delivered to a user may bring rewards to the CP if the content is adopted by the user. The overall reward obtained by the CP depends on the user’s degree of interest in the content and the user’s role in disseminating the content copies. Thus, to maximize the reward, the content provider is motivated to disseminate the authorized content to the most interested users. In this paper, we study how to effectively disseminate the authorized content in Interest-centric Opportunistic Social Networks (IOSNs) such that the reward is maximized. We first derive Social Connection Pattern (SCP) data to handle the challenging opportunistic connections in IOSNs and statistically analyze the interest distribution of the users contacted or connected. The SCP is used to predict the interests of possible contactors and connectors. Then, we propose our SCP-based Dissemination (SCPD) algorithm to calculate the optimum number of content copies to disseminate when two users meet. Our dataset based simulation shows that our SCPD algorithm is effective and efficient to disseminate the authorized content in IOSNs.

Open Access Issue

Location Prediction on Trajectory Data: A Review

Ruizhi Wu, Guangchun Luo, Junming Shao, Ling Tian, Chengzong Peng

Big Data Mining and Analytics 2018, 1(2): 108-127

Published: 12 April 2018

Abstract

PDF (5.4 MB) Collect Collected

Downloads：172

Location prediction is the key technique in many location based services including route navigation, dining location recommendations, and traffic planning and control, to mention a few. This survey provides a comprehensive overview of location prediction, including basic definitions and concepts, algorithms, and applications. First, we introduce the types of trajectory data and related basic concepts. Then, we review existing location-prediction methods, ranging from temporal-pattern-based prediction to spatiotemporal-pattern-based prediction. We also discuss and analyze the advantages and disadvantages of these algorithms and briefly summarize current applications of location prediction in diverse fields. Finally, we identify the potential challenges and future research directions in location prediction.