Decoupled Two-Phase Framework for Class-Incremental Few-Shot Named Entity Recognition

Yifan Chen; Zhen Huang; Minghao Hu; Dongsheng Li; Changjian Wang; Feng Liu; Xicheng Lu

doi:10.26599/TST.2022.9010043

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (9.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Decoupled Two-Phase Framework for Class-Incremental Few-Shot Named Entity Recognition

Yifan Chen^¹, Zhen Huang^¹(

), Minghao Hu^²(

), Dongsheng Li^¹, Changjian Wang^¹, Feng Liu^¹, Xicheng Lu^¹

1College of Computer, National University of Defense Technology, Changsha 410073, China

2Information Research Center of Military Science, PLA Academy of Military Science, Beijing 100091, China

Show Author Information

Abstract

Class-Incremental Few-Shot Named Entity Recognition (CIFNER) aims to identify entity categories that have appeared with only a few newly added (novel) class examples. However, existing class-incremental methods typically introduce new parameters to adapt to new classes and treat all information equally, resulting in poor generalization. Meanwhile, few-shot methods necessitate samples for all observed classes, making them difficult to transfer into a class-incremental setting. Thus, a decoupled two-phase framework method for the CIFNER task is proposed to address the above issues. The whole task is converted to two separate tasks named Entity Span Detection (ESD) and Entity Class Discrimination (ECD) that leverage parameter-cloning and label-fusion to learn different levels of knowledge separately, such as class-generic knowledge and class-specific knowledge. Moreover, different variants, such as the Conditional Random Field-based (CRF-based), word-pair-based methods in ESD module, and add-based, Natural Language Inference-based (NLI-based) and prompt-based methods in ECD module, are investigated to demonstrate the generalizability of the decoupled framework. Extensive experiments on the three Named Entity Recognition (NER) datasets reveal that our method achieves the state-of-the-art performance in the CIFNER setting.

Keywords

deep learning few-shot learning named entity recognition class-incremental learning

References

[1]

J. P. C. Chiu and E. Nichols, Named entity recognition with bidirectional LSTM-CNNs, Trans. Assoc. Comput. Linguist., vol. 4, pp. 357–370, 2016.

Crossref Google Scholar

[2]

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2019, pp. 4171–4186.

Google Scholar

[3]

X. Li, H. Yan, X. Qiu, and X. Huang, FLAT: Chinese NER using flat-lattice transformer, in Proc. 58^th Ann. Meeting of the Association for Computational Linguistics, Virtual, 2020, pp. 6836–6842.

Crossref Google Scholar

[4]

X. Ma and E. Hovy, End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, in Proc. 54^th Ann. Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 1064–1074.

Crossref Google Scholar

[5]

H. Yan, B. Deng, X. Li, and X. Qiu, TENER: Adapting transformer encoder for named entity recognition, arXiv preprint arXiv: 1911.04474, 2019.

Google Scholar

[6]

S. Zhao, M. Hu, Z. Cai, H. Chen, and F. Liu, Dynamic modeling cross-and self-lattice attention network for Chinese NER, in Proc. 35^th AAAI Conf. on Artificial Intelligence, 33^rd Conf. on Innovative Applications of Artificial Intelligence, The 11^th Symp. on Educational Advances in Artificial Intelligence, Virtual, 2021, pp. 14515–14523.

Crossref Google Scholar

[7]

R. M. French, Catastrophic forgetting in connectionist networks, Trends Cognit. Sci., vol. 3, no. 4, 1999, pp. 128–135.

Crossref Google Scholar

[8]

M. McCloskey and N. J. Cohen, Catastrophic interference in connectionist networks: The sequential learning problem, Psychol. Learn. Motiv.–Adv. Res. Theory, vol. 24, pp. 109–165, 1989.

Crossref Google Scholar

[9]

G. Castellucci, S. Filice, D. Croce, and R. Basili, Learning to solve NLP tasks in an incremental number of languages, in Proc. 59^th Ann. Meeting of the Association for Computational Linguistics and the 11^th Int. Joint Conf. on Natural Language Processing, Virtual, 2021, pp. 837–847.

Crossref Google Scholar

[10]

N. Monaikul, G. Castellucci, S. Filice, and O. Rokhlenko, Continual learning for named entity recognition, in Proc. 35^th AAAI Conf. on Artificial Intelligence, 33^rd Conf. on Innovative Applications of Artificial Intelligence, The 11^th Symp. on Educational Advances in Artificial Intelligence, 2021, pp. 13570–13577.

Crossref Google Scholar

[11]

A. Fritzler, V. Logacheva, and M. Kretov, Few-shot classification in named entity recognition task, in Proc. 34^th ACM/SIGAPP Symp. on Applied Computing (SAC’19), Limassol, Cyprus, 2019, pp. 993–1000.

Crossref Google Scholar

[12]

P. Wang, R. Xu, T. Liu, Q. Zhou, Y. Cao, B. Chang, and Z. Sui, An enhanced span-based decomposition method for few-shot sequence labeling, in Proc. 2022 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 2022, pp. 5012–5024.

Crossref Google Scholar

[13]

Y. Yang and A. Katiyar, Simple and effective few-shot named entity recognition with structured nearest neighbor learning, in Proc. 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Virtual, 2020, pp. 6365–6375.

Crossref Google Scholar

[14]

N. Ding, G. Xu, Y. Chen, X. Wang, X. Han, P. Xie, H. Zheng, and Z. Liu, Few-NERD: A few-shot named entity recognition dataset, in Proc. 59^th Ann. Meeting of the Association for Computational Linguistics and the 11^th Int. Joint Conf. on Natural Language Processing, Virtual, 2021, pp. 3198–3213.

Crossref Google Scholar

[15]

L. Cui, Y. Wu, J. Liu, S. Yang, and Y. Zhang, Template-based named entity recognition using BART, in Proc. 59^th Findings of the Association for Computational Linguistics, 2021, pp. 1835–1845.

Crossref Google Scholar

[16]

E. F. T. K. Sang and F. De Meulder, Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition, in Proc. 7^th Conf. on Natural Language Learning at HLT-NAACL 2003, Edmonton, Canada, 2003, pp. 142–147.

Google Scholar

[17]

E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, and R. Weischedel, OntoNotes: The 90% solution, in Proc. 2006 Human Language Technology Conf. of the NAACL, New York City, NY, USA, 2006, pp. 57–60.

Crossref Google Scholar

[18]

Z. Chen and B. Liu, Lifelong machine learning, second edition. Cham, Germany: Springer, 2016.

[19]

Z. Ke, B. Liu, H. Xu, and L. Shu, CLASSIC: Continual and contrastive learning of aspect sentiment classification tasks, in Proc. 2021 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Virtual, 2021, pp. 6871–6883.

Crossref Google Scholar

[20]

F. K. Sun, C. H. Ho, and H. Y. Lee, LAMOL: Language modeling for lifelong language learning, arXiv preprint arXiv: 1909.03329, 2019.

Google Scholar

[21]

C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A Rusu, A. Pritzel, and D. Wierstra, Pathnet: Evolution channels gradient descent in super neural networks, arXiv preprint arXiv: 1701.08734, 2017.

Google Scholar

[22]

Z. Li and D. Hoiem, Learning without forgetting, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 12, pp. 2935–2947, 2018.

Crossref Google Scholar

[23]

S. A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, iCaRL: Incremental classifier and representation learning, in Proc. 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 2001–2010.

Crossref Google Scholar

[24]

T. Gao, X. Han, Z. Liu, and M. Sun, Hybrid attention-based prototypical networks for noisy few-shot relation classification, in Proc. 33^rd AAAI Conf. on Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 6407–6414.

Crossref Google Scholar

[25]

R. Geng, B. Li, Y. Li, J. Sun, and X. Zhu, Dynamic memory induction networks for few-shot text classification, in Proc. 58^th Ann. Meeting of the Association for Computational Linguistics, Virtual, 2020, pp. 1087–1094.

Crossref Google Scholar

[26]

J. Snell, K. Swersky, and R. Zemel, Prototypical networks for few-shot learning, in Proc. 31^st Int. Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 4080–4090.

Google Scholar

[27]

Y. Hou, W. Che, Y. Lai, Z. Zhou, Y. Liu, H. Liu, and T. Liu, Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network, in Proc. 58^th Ann. Meeting of the Association for Computational Linguistics, 2020, pp. 1381–1393.

Crossref Google Scholar

[28]

D. Yu, L. He, Y. Zhang, X. Du, P. Pasupat, and Q. Li, Few-shot intent classification and slot filling with retrieved examples, in Proc. 2021 Conf. of the North American Chapter of the Association for Computational Linguistics, 2021, pp. 734–749.

Crossref Google Scholar

[29]

T. Schick and H. Schütze, Exploiting cloze-questions for few-shot text classification and natural language inference, in Proc. 16^th Conf. of the European Chapter of the Association for Computational Linguistics, Virtual, 2021, pp. 255–269.

Crossref Google Scholar

[30]

S. Wang, H. Fang, M. Khabsa, H. Mao, and H. Ma, Entailment as few-shot learner, arXiv preprint arXiv: 2104.14690, 2021.

Google Scholar

[31]

R. Ma, X. Zhou, T. Gui, Y. Tan, L. Yi, Q. Zhang, and X. Huang, Template-free prompt tuning for few-shot NER, in Proc. 2022 Conf. of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, USA, 2022, pp. 5721–5732.

Crossref Google Scholar

[32]

S. Dong, X. Hong, X. Tao, X. Chang, X. Wei, and Y. Gong, Few-shot class-incremental learning via relation knowledge distillation, in Proc. 35^th AAAI Conf. on Artificial Intelligence, 2019, pp. 1255–1263.

Crossref Google Scholar

[33]

J. Pérez-Rúa, X. Zhu, T. M. Hospedales, and T. Xiang, Incremental few-shot object detection, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 13843–13852.

Crossref Google Scholar

[34]

X. Tao, X. Hong, X. Chang, S. Dong, X. Wei, and Y. Gong, Few-shot class-incremental learning, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 12180–12189.

Crossref Google Scholar

[35]

R. Wang, T. Yu, H. Zhao, S. Kim, S. Mitra, R. Zhang, and R. Henao, Few-shot class-incremental learning for named entity recognition, in Proc. 60^th Ann. Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 571–582.

Crossref Google Scholar

[36]

J. Li, H. Fei, J. Liu, S. Wu, M. Zhang, C. Teng, Ji D, and F. Li, Unified named entity recognition as word-word relation classification, in Proc. 36^th AAAI Conf. on Artificial Intelligence, 2022, pp. 10965–10973.

Crossref Google Scholar

[37]

Y. Chang, L. Kong, K. Jia, and Q. Meng, Chinese named entity recognition method based on BERT, in Proc. 2021 IEEE Int. Conf. on Data Science and Computer Application (ICDSCA), Dalian, China, 2021, pp. 294–299.

Crossref Google Scholar

[38]

J. D. Lafferty, A. McCallum, and F. C. N. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in Proc. 18^th Int. Conf. on Machine Learning (ICML), Virtual, 2001, pp. 282–289.

Google Scholar

[39]

J. Su, GlobalPointer: Handle NER in a unified way, (in Chinese), https://spaces.ac.cn/archives/8373, 2021.

[40]

J. Su, Y. Lu, S. Pan, A. Murtadha, B. Wen, and Y. Liu, RoFormer: Enhanced transformer with rotary position embedding, arXiv preprint arXiv: 2104.09864, 2021.

Google Scholar

[41]

Z. Z. Li, D. W. Feng, D. S. Li, and X. C. Lu, Learning to select pseudo labels: A semi-supervised method for named entity recognition, Front. Inf. Technol. Electron. Eng., vol. 21, no. 6, pp. 903–916, 2020.

Crossref Google Scholar

[42]

T. Schick and H. Schütze, It’s not just size that matters: Small language models are also few-shot learners, in Proc. 2021 Conf. of the North American Chapter of the Association for Computational Linguistics, Virtual, 2021, pp. 2339–2352.

Crossref Google Scholar

[43]

T. Le Scao and A. Rush, How many data points is a prompt worth? in Proc. 2021 Conf. of the North American Chapter of the Association for Computational Linguistics, Virtual, 2021, pp. 2627–2636.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 28 Issue 5,
October 2023

Pages 976-987

DOI: 10.26599/TST.2022.9010043

Cite this article:

Chen Y, Huang Z, Hu M, et al. Decoupled Two-Phase Framework for Class-Incremental Few-Shot Named Entity Recognition. Tsinghua Science and Technology, 2023, 28(5): 976-987. https://doi.org/10.26599/TST.2022.9010043

630

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 23 August 2022

Accepted: 30 September 2022

Published: 19 May 2023

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).