Journal Home > Volume 28 , Issue 5

Class-Incremental Few-Shot Named Entity Recognition (CIFNER) aims to identify entity categories that have appeared with only a few newly added (novel) class examples. However, existing class-incremental methods typically introduce new parameters to adapt to new classes and treat all information equally, resulting in poor generalization. Meanwhile, few-shot methods necessitate samples for all observed classes, making them difficult to transfer into a class-incremental setting. Thus, a decoupled two-phase framework method for the CIFNER task is proposed to address the above issues. The whole task is converted to two separate tasks named Entity Span Detection (ESD) and Entity Class Discrimination (ECD) that leverage parameter-cloning and label-fusion to learn different levels of knowledge separately, such as class-generic knowledge and class-specific knowledge. Moreover, different variants, such as the Conditional Random Field-based (CRF-based), word-pair-based methods in ESD module, and add-based, Natural Language Inference-based (NLI-based) and prompt-based methods in ECD module, are investigated to demonstrate the generalizability of the decoupled framework. Extensive experiments on the three Named Entity Recognition (NER) datasets reveal that our method achieves the state-of-the-art performance in the CIFNER setting.


menu
Abstract
Full text
Outline
About this article

Decoupled Two-Phase Framework for Class-Incremental Few-Shot Named Entity Recognition

Show Author's information Yifan Chen1Zhen Huang1( )Minghao Hu2( )Dongsheng Li1Changjian Wang1Feng Liu1Xicheng Lu1
College of Computer, National University of Defense Technology, Changsha 410073, China
Information Research Center of Military Science, PLA Academy of Military Science, Beijing 100091, China

Abstract

Class-Incremental Few-Shot Named Entity Recognition (CIFNER) aims to identify entity categories that have appeared with only a few newly added (novel) class examples. However, existing class-incremental methods typically introduce new parameters to adapt to new classes and treat all information equally, resulting in poor generalization. Meanwhile, few-shot methods necessitate samples for all observed classes, making them difficult to transfer into a class-incremental setting. Thus, a decoupled two-phase framework method for the CIFNER task is proposed to address the above issues. The whole task is converted to two separate tasks named Entity Span Detection (ESD) and Entity Class Discrimination (ECD) that leverage parameter-cloning and label-fusion to learn different levels of knowledge separately, such as class-generic knowledge and class-specific knowledge. Moreover, different variants, such as the Conditional Random Field-based (CRF-based), word-pair-based methods in ESD module, and add-based, Natural Language Inference-based (NLI-based) and prompt-based methods in ECD module, are investigated to demonstrate the generalizability of the decoupled framework. Extensive experiments on the three Named Entity Recognition (NER) datasets reveal that our method achieves the state-of-the-art performance in the CIFNER setting.

Keywords: deep learning, few-shot learning, named entity recognition, class-incremental learning

References(43)

[1]
J. P. C. Chiu and E. Nichols, Named entity recognition with bidirectional LSTM-CNNs, Trans. Assoc. Comput. Linguist., vol. 4, pp. 357–370, 2016.
[2]
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2019, pp. 4171–4186.
[3]
X. Li, H. Yan, X. Qiu, and X. Huang, FLAT: Chinese NER using flat-lattice transformer, in Proc. 58th Ann. Meeting of the Association for Computational Linguistics, Virtual, 2020, pp. 6836–6842.
[4]
X. Ma and E. Hovy, End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF, in Proc. 54th Ann. Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 1064–1074.
[5]
H. Yan, B. Deng, X. Li, and X. Qiu, TENER: Adapting transformer encoder for named entity recognition, arXiv preprint arXiv: 1911.04474, 2019.
[6]
S. Zhao, M. Hu, Z. Cai, H. Chen, and F. Liu, Dynamic modeling cross-and self-lattice attention network for Chinese NER, in Proc. 35th AAAI Conf. on Artificial Intelligence, 33rd Conf. on Innovative Applications of Artificial Intelligence, The 11th Symp. on Educational Advances in Artificial Intelligence, Virtual, 2021, pp. 14515–14523.
[7]
R. M. French, Catastrophic forgetting in connectionist networks, Trends Cognit. Sci., vol. 3, no. 4, 1999, pp. 128–135.
[8]
M. McCloskey and N. J. Cohen, Catastrophic interference in connectionist networks: The sequential learning problem, Psychol. Learn. Motiv.–Adv. Res. Theory, vol. 24, pp. 109–165, 1989.
[9]
G. Castellucci, S. Filice, D. Croce, and R. Basili, Learning to solve NLP tasks in an incremental number of languages, in Proc. 59th Ann. Meeting of the Association for Computational Linguistics and the 11th Int. Joint Conf. on Natural Language Processing, Virtual, 2021, pp. 837–847.
[10]
N. Monaikul, G. Castellucci, S. Filice, and O. Rokhlenko, Continual learning for named entity recognition, in Proc. 35th AAAI Conf. on Artificial Intelligence, 33rd Conf. on Innovative Applications of Artificial Intelligence, The 11th Symp. on Educational Advances in Artificial Intelligence, 2021, pp. 13570–13577.
[11]
A. Fritzler, V. Logacheva, and M. Kretov, Few-shot classification in named entity recognition task, in Proc. 34th ACM/SIGAPP Symp. on Applied Computing (SAC’19), Limassol, Cyprus, 2019, pp. 993–1000.
[12]
P. Wang, R. Xu, T. Liu, Q. Zhou, Y. Cao, B. Chang, and Z. Sui, An enhanced span-based decomposition method for few-shot sequence labeling, in Proc. 2022 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 2022, pp. 5012–5024.
[13]
Y. Yang and A. Katiyar, Simple and effective few-shot named entity recognition with structured nearest neighbor learning, in Proc. 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Virtual, 2020, pp. 6365–6375.
[14]
N. Ding, G. Xu, Y. Chen, X. Wang, X. Han, P. Xie, H. Zheng, and Z. Liu, Few-NERD: A few-shot named entity recognition dataset, in Proc. 59th Ann. Meeting of the Association for Computational Linguistics and the 11th Int. Joint Conf. on Natural Language Processing, Virtual, 2021, pp. 3198–3213.
[15]
L. Cui, Y. Wu, J. Liu, S. Yang, and Y. Zhang, Template-based named entity recognition using BART, in Proc. 59th Findings of the Association for Computational Linguistics, 2021, pp. 1835–1845.
[16]
E. F. T. K. Sang and F. De Meulder, Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition, in Proc. 7th Conf. on Natural Language Learning at HLT-NAACL 2003, Edmonton, Canada, 2003, pp. 142–147.
[17]
E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, and R. Weischedel, OntoNotes: The 90% solution, in Proc. 2006 Human Language Technology Conf. of the NAACL, New York City, NY, USA, 2006, pp. 57–60.
[18]
Z. Chen and B. Liu, Lifelong machine learning, second edition. Cham, Germany: Springer, 2016.
[19]
Z. Ke, B. Liu, H. Xu, and L. Shu, CLASSIC: Continual and contrastive learning of aspect sentiment classification tasks, in Proc. 2021 Conf. on Empirical Methods in Natural Language Processing (EMNLP), Virtual, 2021, pp. 6871–6883.
[20]
F. K. Sun, C. H. Ho, and H. Y. Lee, LAMOL: Language modeling for lifelong language learning, arXiv preprint arXiv: 1909.03329, 2019.
[21]
C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A Rusu, A. Pritzel, and D. Wierstra, Pathnet: Evolution channels gradient descent in super neural networks, arXiv preprint arXiv: 1701.08734, 2017.
[22]
Z. Li and D. Hoiem, Learning without forgetting, IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 12, pp. 2935–2947, 2018.
[23]
S. A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, iCaRL: Incremental classifier and representation learning, in Proc. 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 2001–2010.
[24]
T. Gao, X. Han, Z. Liu, and M. Sun, Hybrid attention-based prototypical networks for noisy few-shot relation classification, in Proc. 33rd AAAI Conf. on Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 6407–6414.
[25]
R. Geng, B. Li, Y. Li, J. Sun, and X. Zhu, Dynamic memory induction networks for few-shot text classification, in Proc. 58th Ann. Meeting of the Association for Computational Linguistics, Virtual, 2020, pp. 1087–1094.
[26]
J. Snell, K. Swersky, and R. Zemel, Prototypical networks for few-shot learning, in Proc. 31st Int. Conf. on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 4080–4090.
[27]
Y. Hou, W. Che, Y. Lai, Z. Zhou, Y. Liu, H. Liu, and T. Liu, Few-shot slot tagging with collapsed dependency transfer and label-enhanced task-adaptive projection network, in Proc. 58th Ann. Meeting of the Association for Computational Linguistics, 2020, pp. 1381–1393.
[28]
D. Yu, L. He, Y. Zhang, X. Du, P. Pasupat, and Q. Li, Few-shot intent classification and slot filling with retrieved examples, in Proc. 2021 Conf. of the North American Chapter of the Association for Computational Linguistics, 2021, pp. 734–749.
[29]
T. Schick and H. Schütze, Exploiting cloze-questions for few-shot text classification and natural language inference, in Proc. 16th Conf. of the European Chapter of the Association for Computational Linguistics, Virtual, 2021, pp. 255–269.
[30]
S. Wang, H. Fang, M. Khabsa, H. Mao, and H. Ma, Entailment as few-shot learner, arXiv preprint arXiv: 2104.14690, 2021.
[31]
R. Ma, X. Zhou, T. Gui, Y. Tan, L. Yi, Q. Zhang, and X. Huang, Template-free prompt tuning for few-shot NER, in Proc. 2022 Conf. of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, USA, 2022, pp. 5721–5732.
[32]
S. Dong, X. Hong, X. Tao, X. Chang, X. Wei, and Y. Gong, Few-shot class-incremental learning via relation knowledge distillation, in Proc. 35th AAAI Conf. on Artificial Intelligence, 2019, pp. 1255–1263.
[33]
J. Pérez-Rúa, X. Zhu, T. M. Hospedales, and T. Xiang, Incremental few-shot object detection, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 13843–13852.
[34]
X. Tao, X. Hong, X. Chang, S. Dong, X. Wei, and Y. Gong, Few-shot class-incremental learning, in Proc. 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 12180–12189.
[35]
R. Wang, T. Yu, H. Zhao, S. Kim, S. Mitra, R. Zhang, and R. Henao, Few-shot class-incremental learning for named entity recognition, in Proc. 60th Ann. Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 571–582.
[36]
J. Li, H. Fei, J. Liu, S. Wu, M. Zhang, C. Teng, Ji D, and F. Li, Unified named entity recognition as word-word relation classification, in Proc. 36th AAAI Conf. on Artificial Intelligence, 2022, pp. 10965–10973.
[37]
Y. Chang, L. Kong, K. Jia, and Q. Meng, Chinese named entity recognition method based on BERT, in Proc. 2021 IEEE Int. Conf. on Data Science and Computer Application (ICDSCA), Dalian, China, 2021, pp. 294–299.
[38]
J. D. Lafferty, A. McCallum, and F. C. N. Pereira, Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in Proc. 18th Int. Conf. on Machine Learning (ICML), Virtual, 2001, pp. 282–289.
[39]
J. Su, GlobalPointer: Handle NER in a unified way, (in Chinese), https://spaces.ac.cn/archives/8373, 2021.
[40]
J. Su, Y. Lu, S. Pan, A. Murtadha, B. Wen, and Y. Liu, RoFormer: Enhanced transformer with rotary position embedding, arXiv preprint arXiv: 2104.09864, 2021.
[41]
Z. Z. Li, D. W. Feng, D. S. Li, and X. C. Lu, Learning to select pseudo labels: A semi-supervised method for named entity recognition, Front. Inf. Technol. Electron. Eng., vol. 21, no. 6, pp. 903–916, 2020.
[42]
T. Schick and H. Schütze, It’s not just size that matters: Small language models are also few-shot learners, in Proc. 2021 Conf. of the North American Chapter of the Association for Computational Linguistics, Virtual, 2021, pp. 2339–2352.
[43]
T. Le Scao and A. Rush, How many data points is a prompt worth? in Proc. 2021 Conf. of the North American Chapter of the Association for Computational Linguistics, Virtual, 2021, pp. 2627–2636.
Publication history
Copyright
Acknowledgements
Rights and permissions

Publication history

Received: 23 August 2022
Accepted: 30 September 2022
Published: 19 May 2023
Issue date: October 2023

Copyright

© The author(s) 2023.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No. 62006243).

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return