Denoising Graph Inference Network for Document-Level Relation Extraction

Hailin Wang; Ke Qin; Guiduo Duan; Guangchun Luo

doi:10.26599/BDMA.2022.9020051

Big Data Mining and Analytics 2023, 6(2): 248-262 https://doi.org/10.26599/BDMA.2022.9020051

Open Access | Issue | Published: 26 January 2023

Denoising Graph Inference Network for Document-Level Relation Extraction

Show Author's Information Hide Author's Information Hailin Wang^{¹^,²}, Ke Qin^¹, Guiduo Duan^¹(

), Guangchun Luo^¹

1School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

2School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, Chengdu 611130, China

Keywords:

denoising, attention mechanism, Relation Eextraction (RE), document-level, linguistic knowledge

Cite this article:

Wang H, Qin K, Duan G, et al. Denoising Graph Inference Network for Document-Level Relation Extraction. Big Data Mining and Analytics, 2023, 6(2): 248-262. https://doi.org/10.26599/BDMA.2022.9020051

Download citation

EndNote(RIS)

BibTeX

481

Views

Downloads

Citations

Crossref

WoS

Scopus

CSCD

Abstract Full text About this article

Abstract

Relation Extraction (RE) is to obtain a predefined relation type of two entities mentioned in a piece of text, e.g., a sentence-level or a document-level text. Most existing studies suffer from the noise in the text, and necessary pruning is of great importance. The conventional sentence-level RE task addresses this issue by a denoising method using the shortest dependency path to build a long-range semantic dependency between entity pairs. However, this kind of denoising method is scarce in document-level RE. In this work, we explicitly model a denoised document-level graph based on linguistic knowledge to capture various long-range semantic dependencies among entities. We first formalize a Syntactic Dependency Tree forest (SDT-forest) by introducing the syntax and discourse dependency relation. Then, the Steiner tree algorithm extracts a mention-level denoised graph, Steiner Graph (SG), removing linguistically irrelevant words from the SDT-forest. We then devise a slide residual attention to highlight word-level evidence on text and SG. Finally, the classification is established on the SG to infer the relations of entity pairs. We conduct extensive experiments on three public datasets. The results evidence that our method is beneficial to establish long-range semantic dependency and can improve the classification performance with longer texts.

Full text

Abstract

Full text

Outline

About this article

Denoising Graph Inference Network for Document-Level Relation Extraction

Show Author's information Hide Author's Information Hailin Wang^{¹^,²}, Ke Qin^¹, Guiduo Duan^¹(

), Guangchun Luo^¹

1School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

2School of Computing and Artificial Intelligence, Southwestern University of Finance and Economics, Chengdu 611130, China

Abstract

Keywords: denoising, attention mechanism, Relation Eextraction (RE), document-level, linguistic knowledge

References(49)

[1]

H. Wang, C. Focke, R. Sylvester, N. Mishra, and W. Wang, Fine-tune Bert for DocRED with two-step process, arXiv preprint arXiv: 1909.11898, 2019.

Google Scholar

[2]

H. Tang, Y. Cao, Z. Zhang, J. Cao, F. Fang, S. Wang, and P. Yin, HIN: Hierarchical inference network for document-level relation extraction, in Proc. of the Advances in Knowledge Discovery and Data Mining: 24^th Pacific-Asia Conf., Singapore, 2020, pp. 197–209.

DOI Google Scholar

[3]

X. Han and L. Wang, A novel document-level relation extraction method based on BERT and entity information, IEEE Access, vol. 8, pp. 96912–96919, 2020.

DOI Google Scholar

[4]

M. Eberts and A. Ulges, An end-to-end model for entity-level relation extraction using multi-instance learning, in Proc. 16^t⁢h Conf. European Chapter of the Association for Computational Linguistics: Main Volume, Virtual event, 2021, pp. 3650–3660.

DOI Google Scholar

[5]

H. Wang, K. Qin, G. Lu, J. Yin, R. Y. Zakari, and J. W. Owusu, Document-level relation extraction using evidence reasoning on RST-GRAPH, Knowl.-Based Syst., vol. 228, p. 107274, 2021.

DOI Google Scholar

[6]

C. Yuan, H. Huang, C. Feng, G. Shi, and X. Wei, Document-level relation extraction with Entity-Selection Attention, Inform. Sci., vol. 568, pp. 163–174, 2021.

DOI Google Scholar

[7]

R. Li, J. Zhong, Z. Xue, Q. Dai, and X. Li, Heterogenous affinity graph inference network for document-level relation extraction, Knowl.-Based Syst., vol. 250, p. 109146, 2022.

DOI Google Scholar

[8]

R. C. Bunescu and R. J. Mooney, A shortest path dependency kernel for relation extraction, in Proc. Human Language Technology Conf. and Conf. Empirical Methods in Natural Language Processing, Vancouver, Canada, 2005, pp. 724–731.

DOI Google Scholar

[9]

Y. Xu, L. Mou, G. Li, Y. Chen, H. Peng, and Z. Jin, Classifying relations via long short term memory networks along shortest dependency paths, in Proc. 2015 Conf. Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 1785–1794.

DOI Google Scholar

[10]

P. Gupta, S. Rajaram, H. Schütze, and T. Runkler, Neural relation extraction within and across sentence boundaries, in Proc. AAAI Conf. Artif. Intell., vol. 33, no. 1, pp. 6513–6520, 2019.

DOI Google Scholar

[11]

Y. Yao, D. Ye, P. Li, X. Han, Y. Lin, Z. Liu, Z. Liu, L. Huang, J. Zhou, and M. Sun, DocRED: A large-scale document-level relation extraction dataset, in Proc. 57^t⁢h Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 764–777.

DOI Google Scholar

[12]

W. C. Mann and S. A. Thompson, Rhetorical structure theory: Toward a functional theory of text organization, Text-Interdisciplinary Journal for the Study of Discourse, vol. 8, no. 3, pp. 243–281, 1988.

DOI Google Scholar

[13]

Z. Guo, Y. Zhang, Z. Teng, and W. Lu, Densely connected graph convolutional networks for graph-to-sequence learning, Transactions of the Association for Computational Linguistics, vol. 7, pp. 297–312, 2019.

DOI Google Scholar

[14]

K. Xu, Y. Feng, S. Huang, and D. Zhao, Semantic relation classification via convolutional neural networks with simple negative sampling, in Proc. 2015 Conf. Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 536–540.

DOI Google Scholar

[15]

Y. Xu, R. Jia, L. Mou, G. Li, Y. Chen, Y. Lu, and Z. Jin, Improved relation classification by deep recurrent neural networks with data augmentation, in Proc. COLING 2016, the 26^t⁢h Int. Conf. Computational Linguistics: Technical Papers, Osaka, Japan, 2016, pp. 1461–1470.

Google Scholar

[16]

R. Cai, X. Zhang, and H. Wang, Bidirectional recurrent convolutional neural network for relation classification, in Proc. 54^t⁢hAnnu. Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 2016, pp. 756–765.

DOI Google Scholar

[17]

C. Zhang, C. Cui, S. Gao, X. Nie, W. Xu, L. Yang, X. Xi, and Y. Yin, Multi-gram CNN-based self-attention model for relation classification, IEEE Access, vol. 7, pp. 5343–5357, 2019.

DOI Google Scholar

[18]

W. Zhou, K. Huang, T. Ma, and J. Huang, Document-level relation extraction with adaptive thresholding and localized context pooling, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 16, pp. 14612–14620, 2021.

DOI Google Scholar

[19]

F. Xue, A. Sun, H. Zhang, J. Ni, and E. S. Chng, An embarrassingly simple model for dialogue relation extraction, in Proc. of ICASSP 2022–2022 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), Singapore, 2022, pp. 6707–6711.

DOI Google Scholar

[20]

G. Nan, Z. Guo, I. Sekulic, and W. Lu, Reasoning with latent structure refinement for document-level relation extraction, in Proc. 58^t⁢h Annu. Meeting of the Association for Computational Linguistics, Virtual event, 2020, pp. 1546–1557.

DOI Google Scholar

[21]

S. Zeng, R. Xu, B. Chang, and L. Li, Double graph based reasoning for document-level relation extraction, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing (EMNLP), Virtual event, 2020, pp. 1630–1640.

DOI Google Scholar

[22]

B. Li, W. Ye, Z. Sheng, R. Xie, X. Xi, and S. Zhang, Graph enhanced dual attention network for document-level relation extraction, in Proc. 28^t⁢h Int. Conf. Computational Linguistics, Barcelona, Spain, 2020, pp. 1551–1560.

DOI Google Scholar

[23]

F. Christopoulou, M. Miwa, and S. Ananiadou, Connecting the dots: Document-level neural relation extraction with edge-oriented graphs, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9^t⁢h International Joint Conf. Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 4925–4936.

DOI Google Scholar

[24]

F. Xue, A. Sun, H. Zhang, and E. S. Chng, GDPNet: Refining latent multi-view graph for relation extraction, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 16, pp. 14194–14202, 2021.

DOI Google Scholar

[25]

R. C. Bunescu and R. J. Mooney, Subsequence kernels for relation extraction, in Proc. 18^t⁢h Int. Conf. Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2005, pp. 171–178.

Google Scholar

[26]

F. Ren, D. Zhou, Z. Liu, Y. Li, R. Zhao, Y. Liu, and X. Liang, Neural relation classification with text descriptions, in Proc. 27^t⁢h Int. Conf. Computational Linguistics, Santa Fe, NM, USA, 2018, pp. 1167–1177.

Google Scholar

[27]

H. Wang, K. Qin, G. Lu, G. Luo, and G. Liu, Direction-sensitive relation extraction using Bi-SDP attention model, Knowl.-Based Syst., vol. 198, p. 105928, 2020.

DOI Google Scholar

[28]

X. Wang, H. Wang, C. Li, T. Huang, and J. Kurths, Improved consensus conditions for multi-agent systems with uncertain topology: The generalized transition rates case, IEEE Trans. Netw. Sci. Eng., vol. 7, no. 3, pp. 1158–1169, 2020.

DOI Google Scholar

[29]

C. Quirk and H. Poon, Distant supervision for relation extraction beyond the sentence boundary, in Proc. 15^t⁢h Conf. European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Valencia, Spain, 2017, pp. 1171–1182.

DOI Google Scholar

[30]

N. Peng, H. Poon, C. Quirk, K. Toutanova, and W. T. Yih, Cross-sentence n-ary relation extraction with graph LSTMs, Transactions of the Association for Computational Linguistics, vol. 5, pp. 101–115, 2017.

DOI Google Scholar

[31]

T. Hirao, Y. Yoshida, M. Nishino, N. Yasuda, and M. Nagata, Single-document summarization as a tree knapsack problem, in Proc. 2013 Conf. Empirical Methods in Natural Language Processing, Seattle, WA, USA, 2013, pp. 1515–1520.

Google Scholar

[32]

P. Jansen, M. Surdeanu, and P. Clark, Discourse complements lexical semantics for non-factoid answer reranking, in Proc. 52^n⁢d Annu. Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, 2014, pp. 977–986.

DOI Google Scholar

[33]

Y. Yoshida, J. Suzuki, T. Hirao, and M. Nagata, Dependency-based discourse parser for single-document summarization, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1834–1839.

DOI Google Scholar

[34]

Z. Liu and N. Chen, Exploiting discourse-level segmentation for extractive summarization, in Proc. 2^n⁢d Workshop on New Frontiers in Summarization, Hong Kong, China, 2019, pp. 116–121.

DOI Google Scholar

[35]

X. Tan, L. Zhang, F. Kong, and G. Zhou, Towards discourse-aware document-level neural machine translation, in Proc. 31^s⁢t Int. Joint Conf. Artificial Intelligence, Vienna, Austria, 2022, pp. 4383–4389.

DOI Google Scholar

[36]

W. Zhou and M. Chen, An improved baseline for sentence-level relation extraction, arXiv preprint arXiv: 2102.01373, 2022.

Google Scholar

[37]

R. Jia, C. Wong, and H. Poon, Document-level n-ary relation extraction with multiscale representation learning, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, 2019, pp. 3693–3704.

DOI Google Scholar

[38]

Y. Ji and J. Eisenstein, Representation learning for text-level discourse parsing, in Proc. 52^n⁢d Annu. Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, 2014, pp. 13–24.

DOI Google Scholar

[39]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. 31^s⁢t Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 6000–6010.

Google Scholar

[40]

N. Zhang, X. Chen, X. Xie, S. Deng, C. Tan, M. Chen, F. Huang, L. Si, and H. Chen, Document-level relation extraction as semantic segmentation, in Proc. 30^t⁢h Int. Joint Conf. Artificial Intelligence, Montreal, Canada, 2021, pp. 3999–4006.

DOI Google Scholar

[41]

J. Li, Y. Sun, R. J. Johnson, D. Sciaky, C. H. Wei, R. Leaman, A. P. Davis, C. J. Mattingly, T. C. Wiegers, and Z. Lu, Biocreative V CDR task corpus: A resource for chemical disease relation extraction, Database, vol. 2016, p. baw068, 2016.

DOI Google Scholar

[42]

Y. Wu, R. Luo, H. C. M. Leung, H. F. Ting, and T. W. Lam, RENET: A deep learning approach for extracting gene-disease associations from literature, in Proc. 23^r⁢d Annu. Int. Conf. Research in Computational Molecular Biology, Washington, DC, USA, 2019, pp. 272–284.

DOI Google Scholar

[43]

Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, and H. Poon, Domain-specific language model pretraining for biomedical natural language processing, ACM Trans. Comput. Healthcare, vol. 3, no. 1, p. 2, 2022.

DOI Google Scholar

[44]

Z. Guo, Y. Zhang, and W. Lu, Attention guided graph convolutional networks for relation extraction, in Proc. 57^t⁢h Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 241–251.

DOI Google Scholar

[45]

D. Ye, Y. Lin, J. Du, Z. Liu, P. Li, M. Sun, and Z. Liu, Coreferential reasoning learning for language representation, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing (EMNLP), Virtual event, 2020, pp. 7170–7186.

DOI Google Scholar

[46]

B. Xu, Q. Wang, Y. Lyu, Y. Zhu, and Z. Mao, Entity structure within and throughout: Modeling mention dependencies for document-level relation extraction, Proc. AAAI Conf. Artif. Intell., vol. 35, no. 16, pp. 14149–14157, 2021.

DOI Google Scholar

[47]

P. Verga, E. Strubell, and A. McCallum, Simultaneously self-attending to all mentions for full-abstract biological relation extraction, in Proc. 2018 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), New Orleans, LA, 2018, pp. 872–884.

DOI Google Scholar

[48]

I. Beltagy, K. Lo, and A. Cohan, SciBERT: A pretrained language model for scientific text, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9^t⁢h Int. Joint Conf. Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019, pp. 3615–3620.

DOI Google Scholar

[49]

D. Wang, W. Hu, E. Cao, and W. Sun, Global-to-local neural networks for document-level relation extraction, in Proc. 2020 Conf. Empirical Methods in Natural Language Processing (EMNLP), Virtual event, 2020, pp. 3711–3721.

DOI Google Scholar

About this article

Publication history

Acknowledgements

Rights and permissions

Publication history

Received: 07 October 2022

Revised: 05 December 2022

Accepted: 12 December 2022

Published: 26 January 2023

Issue date: June 2023

Copyright

Acknowledgements

This research work was supported by the National Natural Science Foundation of China (Nos. U19A2059 & 62176046).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).