Scholar - SciOpen

Traditional logits-based knowledge distillation methods typically follow a paradigm involving the application of a temperature-scaled softmax function to the logits, aiming for smooth matching. However, few have questioned whether the operational strategy is necessary and reasonable. In this paper, through theoretical analysis and motivational experiments, we confirm that softmax causes the student’s features to diverge from the teacher’s representation space. To address this issue, we propose PurE logit distillation via multi-grAnularity Knowledge transfer (PEAK), a simple but effective approach to knowledge distillation. Instead of relying on softmax, our PEAK directly aligns the original logits (termed pure logits) of the teacher and student model through a scale-invariant normalization module. By leveraging the softmax-free “PEAK” operator, our method achieves pure matching, naturally aligning the mean and standard deviation of teacher-student logits. Furthermore, our consistency augmentation mechanisms adequately preserve multi-granularity relative scales, including inter-class, intra-class, and global-class relationships. Extensive experiments across various tasks (image classification, object detection, and semantic segmentation) and architectures (Convolutional Neural Networks and Vision Transformers) demonstrate that our PEAK attains PEAK performance. For instance, on CIFAR-100 and ImageNet classification tasks, our approach achieved average performance improvements of 0.24 and 0.27 percentage points respectively compared to state-of-the-art methods, highlighting its stability and superiority. Comprehensive ablation and extension studies further validate the effectiveness of the proposed schemes.

Open Access Issue

AInvR: Adaptive Learning Rewards for Knowledge Graph Reasoning Using Agent Trajectories

Hao Zhang, Guoming Lu, Ke Qin, Kai Du

Tsinghua Science and Technology 2023, 28(6): 1101-1114

Published: 28 July 2023

Abstract

PDF (9.7 MB) Collect Collected

Downloads：109

Multi-hop reasoning for incomplete Knowledge Graphs (KGs) demonstrates excellent interpretability with decent performance. Reinforcement Learning (RL) based approaches formulate multi-hop reasoning as a typical sequential decision problem. An intractable shortcoming of multi-hop reasoning with RL is that sparse reward signals make performance unstable. Current mainstream methods apply heuristic reward functions to counter this challenge. However, the inaccurate rewards caused by heuristic functions guide the agent to improper inference paths and unrelated object entities. To this end, we propose a novel adaptive Inverse Reinforcement Learning (IRL) framework for multi-hop reasoning, called AInvR. (1) To counter the missing and spurious paths, we replace the heuristic rule rewards with an adaptive rule reward learning mechanism based on agent’s inference trajectories; (2) to alleviate the impact of over-rewarded object entities misled by inaccurate reward shaping and rules, we propose an adaptive negative hit reward learning mechanism based on agent’s sampling strategy; (3) to further explore diverse paths and mitigate the influence of missing facts, we design a reward dropout mechanism to randomly mask and perturb reward parameters for the reward learning process. Experimental results on several benchmark knowledge graphs demonstrate that our method is more effective than existing multi-hop approaches.

Total 2