AInvR: Adaptive Learning Rewards for Knowledge Graph Reasoning Using Agent Trajectories

Hao Zhang; Guoming Lu; Ke Qin; Kai Du

doi:10.26599/TST.2022.9010063

| Sign up

PDF (9.7 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

AInvR: Adaptive Learning Rewards for Knowledge Graph Reasoning Using Agent Trajectories

Hao Zhang^¹, Guoming Lu^¹(), Ke Qin^¹, Kai Du^¹

1School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

Show Author Information

Abstract

Multi-hop reasoning for incomplete Knowledge Graphs (KGs) demonstrates excellent interpretability with decent performance. Reinforcement Learning (RL) based approaches formulate multi-hop reasoning as a typical sequential decision problem. An intractable shortcoming of multi-hop reasoning with RL is that sparse reward signals make performance unstable. Current mainstream methods apply heuristic reward functions to counter this challenge. However, the inaccurate rewards caused by heuristic functions guide the agent to improper inference paths and unrelated object entities. To this end, we propose a novel adaptive Inverse Reinforcement Learning (IRL) framework for multi-hop reasoning, called AInvR. (1) To counter the missing and spurious paths, we replace the heuristic rule rewards with an adaptive rule reward learning mechanism based on agent’s inference trajectories; (2) to alleviate the impact of over-rewarded object entities misled by inaccurate reward shaping and rules, we propose an adaptive negative hit reward learning mechanism based on agent’s sampling strategy; (3) to further explore diverse paths and mitigate the influence of missing facts, we design a reward dropout mechanism to randomly mask and perturb reward parameters for the reward learning process. Experimental results on several benchmark knowledge graphs demonstrate that our method is more effective than existing multi-hop approaches.

Keywords

Knowledge Graph Reasoning (KGR)Inverse Reinforcement Learning (IRL)multi-hop reasoning

References

[1]

Bollacker

, C.

Evans

, P.

Paritosh

, T.

Sturge

, and J.

Taylor

, Freebase: A collaboratively created graph database for structuring human knowledge, in Proc. 2008 ACM SIGMOD Int. Conf. Management of Data, Vancouver, Canada, 2008, pp. 1247–1250.

Crossref Google Scholar

[2]

F. M.

Suchanek

, G.

Kasneci

, and G.

Weikum

, Yago: A core of semantic knowledge, in Proc. 16^th Int. Conf. World Wide Web, Banff, Canada, 2007, pp. 697–706.

Crossref Google Scholar

[3]

Lehmann

, R.

Isele

, M.

Jakob

, A.

Jentzsch

, D.

Kontokostas

, P. N.

Mendes

, S.

Hellmann

, M.

Morsey

, P.

Van Kleef

, S.

Auer

, et al., DBpedia—A large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web, vol. 6, no. 2, pp. 167–195, 2015.

Crossref Google Scholar

[4]

Wang

, K.

Qin

, G.

, J.

Yin

, R. Y.

Zakari

, and J. W.

Owusu

, Document-level relation extraction using evidence reasoning on RST-GRAPH, Knowl.-Based Syst., vol. 228, p. 107274, 2021.

Crossref Google Scholar

[5]

Wang

, K.

Qin

, G.

Luo

, and G.

Liu

, Direction-sensitive relation extraction using Bi-SDP attention model, Knowl.-Based Syst., vol. 198, p. 105928, 2020.

Crossref Google Scholar

[6]

Hao

, Y.

Zhang

, K.

Liu

, S.

, Z.

Liu

, H.

, and J.

Zhao

, An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge, in Proc. 55^th Annu. Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, 2017, pp. 221–231.

Crossref Google Scholar

[7]

Zhou

, T.

Young

, M.

Huang

, H.

Zhao

, J.

, and X.

Zhu

, Commonsense knowledge aware conversation generation with graph attention, in Proc. 27^th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 4623–4629.

Crossref Google Scholar

[8]

Cao

, X.

Wang

, X.

, Z.

, and T. S.

Chua

, Unifying knowledge graph learning and recommendation: Towards a better understanding of user preferences, in Proc. of the World Wide Web Conf., San Francisco, CA, USA, 2019, pp. 151–161.

Crossref Google Scholar

[9]

Wang

, F.

Zhang

, J.

Wang

, M.

Zhao

, W.

, X.

Xie

, and M.

Guo

, RippleNet: Propagating user preferences on the knowledge graph for recommender systems, in Proc. 27^th ACM Int. Conf. Information and Knowledge Management, Torino, Italy, 2018, pp. 417–426.

Crossref Google Scholar

[10]

, Z.

Cai

, R.

, and W.

, Efficient CityCam-to-edge cooperative learning for vehicle counting in its, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 9, pp. 16600–16611, 2022.

Crossref Google Scholar

[11]

Xiong

, Z.

Cai

, D.

Takabi

, and W.

, Privacy threat and defense for federated learning with non-i.i.d. data in AIoT, IEEE Trans. Ind. Inform., vol. 18, no. 2, pp. 1310–1321, 2022.

Crossref Google Scholar

[12]

Pang

, Y.

Huang

, Z.

Xie

, Q.

Han

, and Z.

Cai

, Realizing the heterogeneity: A self-organized federated learning framework for IoT, IEEE Internet Things J., vol. 8, no. 5, pp. 3088–3098, 2021.

Crossref Google Scholar

[13]

Cai

, Z.

Xiong

, H.

, P.

Wang

, W.

, and Y.

Pan

, Generative adversarial networks: A survey toward private and secure applications, ACM Comput. Surv., vol. 54, no. 6, p. 132, 2022.

Crossref Google Scholar

[14]

, G.

Luo

, and Z.

Cai

, Seed-free graph de-anonymiztiation with adversarial learning, in Proc. 29^th ACM Int. Conf. Information & Knowledge Management, Virtual Event, 2020, pp. 745–754.

Crossref Google Scholar

[15]

Fang

, X.

Huang

, L.

Qin

, Y.

Zhang

, W.

Zhang

, R.

Cheng

, and X.

Lin

, A survey of community search over big graphs, VLDB J., vol. 29, no. 1, pp. 353–392, 2020.

Crossref Google Scholar

[16]

Zhou

, N.

Duan

, S.

Liu

, and H. Y.

Shum

, Progress in neural NLP: Modeling, learning, and reasoning, Engineering, vol. 6, no. 3, pp. 275–290, 2020.

Crossref Google Scholar

[17]

Zhang

, B.

Chen

, L.

Zhang

, X.

, and H.

Ding

, Neural, symbolic and neural-symbolic reasoning on knowledge graphs, AI Open, vol. 2, pp. 14–35, 2021.

Crossref Google Scholar

[18]

Bordes

, N.

Usunier

, A.

Garcia-Durán

, J.

Weston

, and O.

Yakhnenko

, Translating embeddings for modeling multi-relational data, in Proc. 26^th Int. Conf. Neural Information Processing Systems-Volume 2, Lake Tahoe, NV, USA, 2013, pp. 2787–2795.

Google Scholar

[19]

Trouillon

, J.

Welbl

, S.

Riedel

, É.

Gaussier

, and G.

Bouchard

, Complex embeddings for simple link prediction, in Proc. 33^rd Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 2071–2080.

Google Scholar

[20]

Sun

, Z. H.

Deng

, J. Y.

Nie

, and J.

Tang

, RotatE: Knowledge graph embedding by relational rotation in complex space, arXiv preprint arXiv:1902.10197, 2019.

Google Scholar

[21]

S. M.

Kazemi

and D.

Poole

, Simple embedding for link prediction in knowledge graphs, in Proc. 32^nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 4289–4300.

Google Scholar

[22]

Vashishth

, S.

Sanyal

, V.

Nitin

, and P.

Talukdar

, Composition-based multi-relational graph convolutional networks, arXiv preprint arXiv:1911.03082, 2020.

Google Scholar

[23]

L. A.

Galárraga

, C.

Teflioudi

, K.

Hose

, and F.

Suchanek

, AMIE: Association rule mining under incomplete evidence in ontological knowledge bases, in Proc. 22^nd Int. Conf. World Wide Web, Rio de Janeiro, Brazil, 2013, pp. 413–422.

Crossref Google Scholar

[24]

P. G.

Omran

, K.

Wang

, and Z.

Wang

, Scalable rule learning via learning representation, in Proc. 27^th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 2149–2155.

Google Scholar

[25]

Nathani

, J.

Chauhan

, C.

Sharma

, and M.

Kaul

, Learning attention-based embeddings for relation prediction in knowledge graphs, in Proc. 57^th Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 4710–4723.

Crossref Google Scholar

[26]

Meilicke

, M. W.

Chekol

, D.

Ruffinelli

, and H.

Stuckenschmidt

, Anytime bottom-up rule learning for knowledge graph completion, in Proc. 28^th Int. Joint Conf. Artificial Intelligence, Macao, China, 2019, pp. 3137–3143.

Crossref Google Scholar

[27]

and J.

Tang

, Probabilistic logic neural networks for reasoning, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 7712–7722.

Google Scholar

[28]

Xiong

, T.

Hoang

, and W. Y.

Wang

, DeepPath: A reinforcement learning method for knowledge graph reasoning, in Proc. 2017 Conf. Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 564–573.

Crossref Google Scholar

[29]

Das

, S.

Dhuliawala

, M.

Zaheer

, L.

Vilnis

, I.

Durugkar

, A.

Krishnamurthy

, A.

Smola

, and A.

McCallum

, Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning, arXiv preprint arXiv:1711.05851, 2018.

Google Scholar

[30]

X. V.

Lin

, R.

Socher

, and C.

Xiong

, Multi-hop knowledge graph reasoning with reward shaping, in Proc. 2018 Conf. Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 3243–3253.

Crossref Google Scholar

[31]

Shen

, J.

Chen

, P. S.

Huang

, Y.

Guo

, and J.

Gao

, M-Walk: Learning to walk over graphs using Monte Carlo tree search, in Proc. 32^nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 6787–6798.

Google Scholar

[32]

Wang

, S.

, R.

Pan

, and M.

Mao

, Incorporating graph attention mechanism into knowledge graph reasoning based on deep reinforcement learning, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9^th Int. Joint Conf. Natural Language Processing, Hong Kong, China, 2019, pp. 2623–2631.

Crossref Google Scholar

[33]

, X.

Han

, L.

Hou

, J.

, Z.

Liu

, W.

Zhang

, Y.

Zhang

, H.

Kong

, and S.

, Dynamic anticipation and completion for multi-hop reasoning over sparse knowledge graph, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing (EMNLP), Virtual Event, 2020, pp. 5694–5703.

Crossref Google Scholar

[34]

Lei

, G.

Jiang

, X.

, K.

Sun

, Y.

Mao

, and X.

Ren

, Learning collaborative agents with rule guidance for knowledge graph reasoning, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing, Virtual Event, 2020, pp. 8541–8547.

Crossref Google Scholar

[35]

Hou

, X.

Jin

, Z.

, and L.

Bai

, Rule-aware reinforcement learning for knowledge graph reasoning, in Proc. of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Virtual Event, 2021, pp. 4687–4692.

Crossref Google Scholar

[36]

Wang

, Y.

Yao

, H.

Tong

, F.

, and J.

, A brief review of network embedding, Big Data Mining and Analytics, vol. 2, no. 1, pp. 35–47, 2019.

Crossref Google Scholar

[37]

Wang

, Z.

Cao

, Y.

Zhou

, Z. K.

Guo

, and Z.

Ren

, Sampling with prior knowledge for high-dimensional gravitational wave data analysis, Big Data Mining and Analytics, vol. 5, no. 1, pp. 53–63, 2022.

Crossref Google Scholar

[38]

Lin

, Z.

Liu

, M.

Sun

, Y.

Liu

, and X.

Zhu

, Learning entity and relation embeddings for knowledge graph completion, in Proc. 29^th AAAI Conf. Artificial Intelligence, Austin, TX, USA, 2015, pp. 2181–2187.

Crossref Google Scholar

[39]

Nickel

, V.

Tresp

, and H. P.

Kriegel

, A three-way model for collective learning on multi-relational data, in Proc. 28^th Int. Conf. Machine Learning, Bellevue, WA, USA, 2011, pp. 809–816.

Google Scholar

[40]

Dettmers

, P.

Minervini

, P.

Stenetorp

, and S.

Riedel

, Convolutional 2D knowledge graph embeddings, in Proc. 32^nd AAAI Conf. Artificial Intelligence and 30^th Innovative Applications of Artificial Intelligence Conf. and 8^th AAAI Symp. Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 1811–1818.

Crossref Google Scholar

[41]

Schlichtkrull

, T. N.

Kipf

, P.

Bloem

, R.

Van Den Berg

, I.

Titov

, and M.

Welling

, Modeling relational data with graph convolutional networks, in Proc. 15^th Int. Conf. Semantic Web, Heraklion, Greece, 2018, pp. 593–607.

Crossref Google Scholar

[42]

Yang

, Z.

Yang

, and W. W.

Cohen

, Differentiable learning of logical rules for knowledge base reasoning, in Proc. 31^st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 2316–2325.

Google Scholar

[43]

Lao

, T.

Mitchell

, and W.

Cohen

, Random walk inference and learning in a large scale knowledge base, in Proc. Conf. Empirical Methods in Natural Language Processing, Edinburgh, UK, 2011, pp. 529–539.

Google Scholar

[44]

R. S.

Sutton

and A. G.

Barto

, Reinforcement Learning: An Introduction. 2nd ed. Cambridge, MA, USA: MIT Press, 2018.

[45]

Zhu

and T.

Zhang

, Deep reinforcement learning based mobile robot navigation: A review, Tsinghua Science and Technology, vol. 26, no. 5, pp. 674–691, 2021.

Crossref Google Scholar

[46]

Fang

, X.

Wei

, F.

Sun

, H.

Huang

, Y.

, and H.

Liu

, Skill learning for human-robot interaction using wearable device, Tsinghua Science and Technology, vol. 24, no. 6, pp. 654–662, 2019.

Crossref Google Scholar

[47]

Ranzato

, S.

Chopra

, M.

Auli

, and W.

Zaremba

, Sequence level training with recurrent neural networks, arXiv preprint arXiv:1511.06732, 2016.

Google Scholar

[48]

Hochreiter

and J.

Schmidhuber

, Long short-term memory, Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

Crossref Google Scholar

[49]

, G.

Zhang

, W. Z.

, and N.

Xie

, Measures of uncertainty for knowledge bases, Knowl. Inform. Syst., vol. 62, no. 2, pp. 611–637, 2020.

Crossref Google Scholar

[50]

Finn

, S.

Levine

, and P.

Abbeel

, Guided cost learning: Deep inverse optimal control via policy optimization, in Proc. 33^rd Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 49–58.

Google Scholar

[51]

R. J.

Williams

, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn., vol. 8, no. 3, pp. 229–256, 1992.

Crossref Google Scholar

[52]

Toutanova

, D.

Chen

, P.

Pantel

, H.

Poon

, P.

Choudhury

, and M.

Gamon

, Representing text for joint embedding of text and knowledge bases, in Proc. 2015 Conf. Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 1499–1509.

Crossref Google Scholar

[53]

Safavi

and D.

Koutra

, CoDEx: A comprehensive knowledge graph completion benchmark, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing, Virtual Event, 2020, pp. 8328–8350.

Crossref Google Scholar

[54]

Zhang

, Y.

Tay

, L.

Yao

, and Q.

Liu

, Quaternion knowledge graph embeddings, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 2735–2745.

Google Scholar

[55]

, J.

Chen

, L. P.

Xhonneux

, Y.

Bengio

, and J.

Tang

, RNNLogic: Learning logic rules for reasoning on knowledge graphs, arXiv preprint arXiv:2010.04029, 2021.

Google Scholar

[56]

Ruffinelli

, S.

Broscheit

, and R.

Gemulla

, You CAN teach an old dog new tricks! On training knowledge graph embeddings, https://openreview.net/forum?id=BkxSmlBFvr.

[57]

Sadeghian

, M.

Armandpour

, P.

Ding

, and D. Z.

Wang

, DRUM: End-to-end differentiable rule mining on knowledge graphs, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 15347–15357.

Google Scholar

Tsinghua Science and Technology

Volume 28 Issue 6,
December 2023

Pages 1101-1114

DOI: 10.26599/TST.2022.9010063

Cite this article:

Zhang H, Lu G, Qin K, et al. AInvR: Adaptive Learning Rewards for Knowledge Graph Reasoning Using Agent Trajectories. Tsinghua Science and Technology, 2023, 28(6): 1101-1114. https://doi.org/10.26599/TST.2022.9010063