Abstract
Multi-hop reasoning for incomplete Knowledge Graphs (KGs) demonstrates excellent interpretability with decent performance. Reinforcement Learning (RL) based approaches formulate multi-hop reasoning as a typical sequential decision problem. An intractable shortcoming of multi-hop reasoning with RL is that sparse reward signals make performance unstable. Current mainstream methods apply heuristic reward functions to counter this challenge. However, the inaccurate rewards caused by heuristic functions guide the agent to improper inference paths and unrelated object entities. To this end, we propose a novel adaptive Inverse Reinforcement Learning (IRL) framework for multi-hop reasoning, called AInvR. (1) To counter the missing and spurious paths, we replace the heuristic rule rewards with an adaptive rule reward learning mechanism based on agent’s inference trajectories; (2) to alleviate the impact of over-rewarded object entities misled by inaccurate reward shaping and rules, we propose an adaptive negative hit reward learning mechanism based on agent’s sampling strategy; (3) to further explore diverse paths and mitigate the influence of missing facts, we design a reward dropout mechanism to randomly mask and perturb reward parameters for the reward learning process. Experimental results on several benchmark knowledge graphs demonstrate that our method is more effective than existing multi-hop approaches.