AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (9.7 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

AInvR: Adaptive Learning Rewards for Knowledge Graph Reasoning Using Agent Trajectories

School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Show Author Information

Abstract

Multi-hop reasoning for incomplete Knowledge Graphs (KGs) demonstrates excellent interpretability with decent performance. Reinforcement Learning (RL) based approaches formulate multi-hop reasoning as a typical sequential decision problem. An intractable shortcoming of multi-hop reasoning with RL is that sparse reward signals make performance unstable. Current mainstream methods apply heuristic reward functions to counter this challenge. However, the inaccurate rewards caused by heuristic functions guide the agent to improper inference paths and unrelated object entities. To this end, we propose a novel adaptive Inverse Reinforcement Learning (IRL) framework for multi-hop reasoning, called AInvR. (1) To counter the missing and spurious paths, we replace the heuristic rule rewards with an adaptive rule reward learning mechanism based on agent’s inference trajectories; (2) to alleviate the impact of over-rewarded object entities misled by inaccurate reward shaping and rules, we propose an adaptive negative hit reward learning mechanism based on agent’s sampling strategy; (3) to further explore diverse paths and mitigate the influence of missing facts, we design a reward dropout mechanism to randomly mask and perturb reward parameters for the reward learning process. Experimental results on several benchmark knowledge graphs demonstrate that our method is more effective than existing multi-hop approaches.

References

[1]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, Freebase: A collaboratively created graph database for structuring human knowledge, in Proc. 2008 ACM SIGMOD Int. Conf. Management of Data, Vancouver, Canada, 2008, pp. 1247–1250.
[2]
F. M. Suchanek, G. Kasneci, and G. Weikum, Yago: A core of semantic knowledge, in Proc. 16th Int. Conf. World Wide Web, Banff, Canada, 2007, pp. 697–706.
[3]
J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. Van Kleef, S. Auer, et al., DBpedia—A large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web, vol. 6, no. 2, pp. 167–195, 2015.
[4]
H. Wang, K. Qin, G. Lu, J. Yin, R. Y. Zakari, and J. W. Owusu, Document-level relation extraction using evidence reasoning on RST-GRAPH, Knowl.-Based Syst., vol. 228, p. 107274, 2021.
[5]
H. Wang, K. Qin, G. Lu, G. Luo, and G. Liu, Direction-sensitive relation extraction using Bi-SDP attention model, Knowl.-Based Syst., vol. 198, p. 105928, 2020.
[6]
Y. Hao, Y. Zhang, K. Liu, S. He, Z. Liu, H. Wu, and J. Zhao, An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge, in Proc. 55th Annu. Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, 2017, pp. 221–231.
[7]
H. Zhou, T. Young, M. Huang, H. Zhao, J. Xu, and X. Zhu, Commonsense knowledge aware conversation generation with graph attention, in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 4623–4629.
[8]
Y. Cao, X. Wang, X. He, Z. Hu, and T. S. Chua, Unifying knowledge graph learning and recommendation: Towards a better understanding of user preferences, in Proc. of the World Wide Web Conf., San Francisco, CA, USA, 2019, pp. 151–161.
[9]
H. Wang, F. Zhang, J. Wang, M. Zhao, W. Li, X. Xie, and M. Guo, RippleNet: Propagating user preferences on the knowledge graph for recommender systems, in Proc. 27th ACM Int. Conf. Information and Knowledge Management, Torino, Italy, 2018, pp. 417–426.
[10]
H. Xu, Z. Cai, R. Li, and W. Li, Efficient CityCam-to-edge cooperative learning for vehicle counting in its, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 9, pp. 16600–16611, 2022.
[11]
Z. Xiong, Z. Cai, D. Takabi, and W. Li, Privacy threat and defense for federated learning with non-i.i.d. data in AIoT, IEEE Trans. Ind. Inform., vol. 18, no. 2, pp. 1310–1321, 2022.
[12]
J. Pang, Y. Huang, Z. Xie, Q. Han, and Z. Cai, Realizing the heterogeneity: A self-organized federated learning framework for IoT, IEEE Internet Things J., vol. 8, no. 5, pp. 3088–3098, 2021.
[13]
Z. Cai, Z. Xiong, H. Xu, P. Wang, W. Li, and Y. Pan, Generative adversarial networks: A survey toward private and secure applications, ACM Comput. Surv., vol. 54, no. 6, p. 132, 2022.
[14]
K. Li, G. Lu, G. Luo, and Z. Cai, Seed-free graph de-anonymiztiation with adversarial learning, in Proc. 29th ACM Int. Conf. Information & Knowledge Management, Virtual Event, 2020, pp. 745–754.
[15]
Y. Fang, X. Huang, L. Qin, Y. Zhang, W. Zhang, R. Cheng, and X. Lin, A survey of community search over big graphs, VLDB J., vol. 29, no. 1, pp. 353–392, 2020.
[16]
M. Zhou, N. Duan, S. Liu, and H. Y. Shum, Progress in neural NLP: Modeling, learning, and reasoning, Engineering, vol. 6, no. 3, pp. 275–290, 2020.
[17]
J. Zhang, B. Chen, L. Zhang, X. Ke, and H. Ding, Neural, symbolic and neural-symbolic reasoning on knowledge graphs, AI Open, vol. 2, pp. 14–35, 2021.
[18]
A. Bordes, N. Usunier, A. Garcia-Durán, J. Weston, and O. Yakhnenko, Translating embeddings for modeling multi-relational data, in Proc. 26th Int. Conf. Neural Information Processing Systems-Volume 2, Lake Tahoe, NV, USA, 2013, pp. 2787–2795.
[19]
T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard, Complex embeddings for simple link prediction, in Proc. 33rd Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 2071–2080.
[20]
Z. Sun, Z. H. Deng, J. Y. Nie, and J. Tang, RotatE: Knowledge graph embedding by relational rotation in complex space, arXiv preprint arXiv:1902.10197, 2019.
[21]
S. M. Kazemi and D. Poole, Simple embedding for link prediction in knowledge graphs, in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 4289–4300.
[22]
S. Vashishth, S. Sanyal, V. Nitin, and P. Talukdar, Composition-based multi-relational graph convolutional networks, arXiv preprint arXiv:1911.03082, 2020.
[23]
L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek, AMIE: Association rule mining under incomplete evidence in ontological knowledge bases, in Proc. 22nd Int. Conf. World Wide Web, Rio de Janeiro, Brazil, 2013, pp. 413–422.
[24]
P. G. Omran, K. Wang, and Z. Wang, Scalable rule learning via learning representation, in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 2149–2155.
[25]
D. Nathani, J. Chauhan, C. Sharma, and M. Kaul, Learning attention-based embeddings for relation prediction in knowledge graphs, in Proc. 57th Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 4710–4723.
[26]
C. Meilicke, M. W. Chekol, D. Ruffinelli, and H. Stuckenschmidt, Anytime bottom-up rule learning for knowledge graph completion, in Proc. 28th Int. Joint Conf. Artificial Intelligence, Macao, China, 2019, pp. 3137–3143.
[27]
M. Qu and J. Tang, Probabilistic logic neural networks for reasoning, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 7712–7722.
[28]
W. Xiong, T. Hoang, and W. Y. Wang, DeepPath: A reinforcement learning method for knowledge graph reasoning, in Proc. 2017 Conf. Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 564–573.
[29]
R. Das, S. Dhuliawala, M. Zaheer, L. Vilnis, I. Durugkar, A. Krishnamurthy, A. Smola, and A. McCallum, Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning, arXiv preprint arXiv:1711.05851, 2018.
[30]
X. V. Lin, R. Socher, and C. Xiong, Multi-hop knowledge graph reasoning with reward shaping, in Proc. 2018 Conf. Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018, pp. 3243–3253.
[31]
Y. Shen, J. Chen, P. S. Huang, Y. Guo, and J. Gao, M-Walk: Learning to walk over graphs using Monte Carlo tree search, in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 6787–6798.
[32]
H. Wang, S. Li, R. Pan, and M. Mao, Incorporating graph attention mechanism into knowledge graph reasoning based on deep reinforcement learning, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9th Int. Joint Conf. Natural Language Processing, Hong Kong, China, 2019, pp. 2623–2631.
[33]
X. Lv, X. Han, L. Hou, J. Li, Z. Liu, W. Zhang, Y. Zhang, H. Kong, and S. Wu, Dynamic anticipation and completion for multi-hop reasoning over sparse knowledge graph, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing (EMNLP), Virtual Event, 2020, pp. 5694–5703.
[34]
D. Lei, G. Jiang, X. Gu, K. Sun, Y. Mao, and X. Ren, Learning collaborative agents with rule guidance for knowledge graph reasoning, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing, Virtual Event, 2020, pp. 8541–8547.
[35]
Z. Hou, X. Jin, Z. Li, and L. Bai, Rule-aware reinforcement learning for knowledge graph reasoning, in Proc. of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Virtual Event, 2021, pp. 4687–4692.
[36]
Y. Wang, Y. Yao, H. Tong, F. Xu, and J. Lu, A brief review of network embedding, Big Data Mining and Analytics, vol. 2, no. 1, pp. 35–47, 2019.
[37]
H. Wang, Z. Cao, Y. Zhou, Z. K. Guo, and Z. Ren, Sampling with prior knowledge for high-dimensional gravitational wave data analysis, Big Data Mining and Analytics, vol. 5, no. 1, pp. 53–63, 2022.
[38]
Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu, Learning entity and relation embeddings for knowledge graph completion, in Proc. 29th AAAI Conf. Artificial Intelligence, Austin, TX, USA, 2015, pp. 2181–2187.
[39]
M. Nickel, V. Tresp, and H. P. Kriegel, A three-way model for collective learning on multi-relational data, in Proc. 28th Int. Conf. Machine Learning, Bellevue, WA, USA, 2011, pp. 809–816.
[40]
T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel, Convolutional 2D knowledge graph embeddings, in Proc. 32nd AAAI Conf. Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conf. and 8th AAAI Symp. Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 1811–1818.
[41]
M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling, Modeling relational data with graph convolutional networks, in Proc. 15th Int. Conf. Semantic Web, Heraklion, Greece, 2018, pp. 593–607.
[42]
F. Yang, Z. Yang, and W. W. Cohen, Differentiable learning of logical rules for knowledge base reasoning, in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 2316–2325.
[43]
N. Lao, T. Mitchell, and W. Cohen, Random walk inference and learning in a large scale knowledge base, in Proc. Conf. Empirical Methods in Natural Language Processing, Edinburgh, UK, 2011, pp. 529–539.
[44]
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. 2nd ed. Cambridge, MA, USA: MIT Press, 2018.
[45]
K. Zhu and T. Zhang, Deep reinforcement learning based mobile robot navigation: A review, Tsinghua Science and Technology, vol. 26, no. 5, pp. 674–691, 2021.
[46]
B. Fang, X. Wei, F. Sun, H. Huang, Y. Yu, and H. Liu, Skill learning for human-robot interaction using wearable device, Tsinghua Science and Technology, vol. 24, no. 6, pp. 654–662, 2019.
[47]
M. Ranzato, S. Chopra, M. Auli, and W. Zaremba, Sequence level training with recurrent neural networks, arXiv preprint arXiv:1511.06732, 2016.
[48]
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[49]
Z. Li, G. Zhang, W. Z. Wu, and N. Xie, Measures of uncertainty for knowledge bases, Knowl. Inform. Syst., vol. 62, no. 2, pp. 611–637, 2020.
[50]
C. Finn, S. Levine, and P. Abbeel, Guided cost learning: Deep inverse optimal control via policy optimization, in Proc. 33rd Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 49–58.
[51]
R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn., vol. 8, no. 3, pp. 229–256, 1992.
[52]
K. Toutanova, D. Chen, P. Pantel, H. Poon, P. Choudhury, and M. Gamon, Representing text for joint embedding of text and knowledge bases, in Proc. 2015 Conf. Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 1499–1509.
[53]
T. Safavi and D. Koutra, CoDEx: A comprehensive knowledge graph completion benchmark, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing, Virtual Event, 2020, pp. 8328–8350.
[54]
S. Zhang, Y. Tay, L. Yao, and Q. Liu, Quaternion knowledge graph embeddings, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 2735–2745.
[55]
M. Qu, J. Chen, L. P. Xhonneux, Y. Bengio, and J. Tang, RNNLogic: Learning logic rules for reasoning on knowledge graphs, arXiv preprint arXiv:2010.04029, 2021.
[56]
D. Ruffinelli, S. Broscheit, and R. Gemulla, You CAN teach an old dog new tricks! On training knowledge graph embeddings, https://openreview.net/forum?id=BkxSmlBFvr.
[57]
A. Sadeghian, M. Armandpour, P. Ding, and D. Z. Wang, DRUM: End-to-end differentiable rule mining on knowledge graphs, in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 15347–15357.
Tsinghua Science and Technology
Pages 1101-1114
Cite this article:
Zhang H, Lu G, Qin K, et al. AInvR: Adaptive Learning Rewards for Knowledge Graph Reasoning Using Agent Trajectories. Tsinghua Science and Technology, 2023, 28(6): 1101-1114. https://doi.org/10.26599/TST.2022.9010063

520

Views

28

Downloads

2

Crossref

1

Web of Science

2

Scopus

0

CSCD

Altmetrics

Received: 17 July 2022
Revised: 02 December 2022
Accepted: 07 December 2022
Published: 28 July 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return