Multi-Granularity Feature Fusion for Named Entity Recognition of Classical Chinese Texts from the Perspective of Digital Humanities

Meng Jiana, Xu Yingao, Zhao Dandan, Li Fengyi, Zhao Di

Knowledge Management Forum ›› 2024, Vol. 9 ›› Issue (6) : 533-546.

PDF(3081 KB)
PDF(3081 KB)
Knowledge Management Forum ›› 2024, Vol. 9 ›› Issue (6) : 533-546. DOI: 10.13266/j.issn.2095-5472.2024.039  CSTR: 32306.14.CN11-6036.2024.039

Multi-Granularity Feature Fusion for Named Entity Recognition of Classical Chinese Texts from the Perspective of Digital Humanities

Author information +
History +

Abstract

[Purpose/Significance] Leveraging Named Entity Recognition (NER) techniques for the thorough exploration of ancient literary documents not only drives forward the digitization of ancient Chinese texts, including the vital process of Ancient text digitization, which is crucial for historical studies, bolstering cultural confidence, promoting traditional Chinese culture, and advancing Named Entity Recognition (NER) as a foundational task in NLP. [Method/Process] A method for named entity recognition in classical Chinese texts with multi-granularity feature fusion was proposed, Leveraging "Zuo Zhuan" as the research corpus and formulating named entity recognition tasks for personal names, geographical names, temporal entities, etc. Initially, ancient character information, part-of-speech (POS) information, and glyph features were integrated to enhance input feature representation. Subsequently, auxiliary tasks for predicting entity boundaries were introduced, alongside the utilization of a Transfer Interactor heuristic to learn classical Chinese entity formation rules. This was complemented by joint contextual information extraction using BiLSTM and IDCNN (Iterated Dilated Convolutional Neural Network). Finally, learned features were weighted and merged into a CRF (Conditional Random Field) for entity prediction. [Result/Conclusion] Experimental results demonstrate that the proposed method of multi-granularity feature fusion for named entity recognition in classical Chinese texts enhances precision, recall, and F1 score by 5.09%, 13.45%, and 9.87%, respectively, compared to the mainstream BERT-BiLSTM-CRF method. Multi-granularity feature fusion for named entity recognition in classical Chinese texts is crucial for accurately identifying named entities in ancient texts.

Key words

digital humanities / classical Chinese / entity recognition / multi-granularity feature fusion

Cite this article

Download Citations
Meng Jiana , Xu Yingao , Zhao Dandan , et al . Multi-Granularity Feature Fusion for Named Entity Recognition of Classical Chinese Texts from the Perspective of Digital Humanities[J]. Knowledge Management Forum. 2024, 9(6): 533-546 https://doi.org/10.13266/j.issn.2095-5472.2024.039

References

[1]
王东波.SikuBERT:数字人文下的古籍智能信息处理(专题前言)[J]. 图书馆论坛,2022,42(6):30.(WANG D B. SikuBERT: intelligent information processing of ancient texts in digital humanities(special introduction)[J]. Library tribune,2022,42(6):30.)
[2]
GRISHMAN R, SUNDHEIM B. Message understanding conference 6: a brief history[C]// Proceedings of the 16th conference on computational linguistics. Stroudsburg: Association for Computational Linguistics, 1996.
[3]
HAMMERTON J. Named entity recognition with long short-term memory[C]// Proceedings of Conference on natural language learning at HLT-NAACL. Stroudsburg: Association for Computational Linguistics, 2003.
[4]
COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of machine learning research, 2011, 12(1):2493-2537.
[5]
HUANG Z, XU W, YU K. Bidirectional LSTM-CRF models for sequence tagging: computer science[EB/OL]. [2024-06-20].https://arxiv.org/abs/1508.01991.
[6]
CHIU J P C, NICHOLS E. Named entity recognition with bidirectional LSTM-CNNs: computer science[EB/OL].[2024-06-20].https://aclanthology.org/Q16-1026.
[7]
AKBIK A, BLYTHE D, VOLLGRAF R. Contextual string embeddings for sequence labeling[C]// Proceedings of International conference on computational linguistics. Stroudsburg: Association for Computational Linguistics, 2018.
[8]
DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding: computer science[EB/OL]. [2024-06-20].https://arxiv.org/abs/1810.04805.
[9]
LAN Z, CHEN M, GOODMAN S, et al. ALBERT: a lite BERT for self-supervised learning of language representations: computer science[EB/OL]. [2024-07-15].https://arxiv.org/abs/1909.11942.
[10]
LIU Y, OTT M, GOYAL N, et al. RoBERTa: a robustly optimized BERT pretraining approach: computer science[EB/OL]. [2024-07-15].https://arxiv.org/abs/1907.11692.
[11]
刘江峰,冯钰童,王东波,等. 数字人文视域下SikuBERT增强的史籍实体识别研究[J].图书馆论坛,2022,42(10):61-72.(LIU J F, FENG Y T, WANG D B. Research on historical entity recognition enhanced by SikuBERT under the perspective of digital humanities[J]. Library tribune,2022,42(10):61-72.)
[12]
WANG P, REN Z. The uncertainty-based retrieval framework for ancient Chinese CWS and POS: computer science[EB/OL]. [2024-07-20].https://arxiv.org/abs/2310.08496.
[13]
ZHANG Y, YANG J. Chinese NER using Lattice LSTM: computer science[EB/OL]. [2024-07-20].https://arxiv.org/abs/1805.02023.
[14]
LI X, YAN H, QIU X, et al. FLAT: Chinese NER using Flat-Lattice Transformer: computer science[EB/OL]. [2024-07-20].https://arxiv.org/abs/2004.11795.
[15]
谢靖,刘江峰,王东波.古代中国医学文献的命名实体识别研究——以Flat-lattice增强的SikuBERT预训练模型为例[J].图书馆论坛,2022,42(10):51-60.(XIE J, LIU J F, WANG D B. Research on named entity recognition of ancient Chinese medical literature: a case study of flat-lattice enhanced SikuBERT pre-trained model[J]. Library tribune,2022,42(10):51-60.)
[16]
PENG M, MA R, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER: computer science[EB/OL]. [2024-07-20].https://arxiv.org/abs/1908.05969.
[17]
SUN Z, LI X, SUN X, et al. Chinesebert: Chinese pretraining enhanced by glyph and pinyin information: computer science[EB/OL]. [2024-07-26].https://arxiv.org/abs/2106.16038.
[18]
尹成龙, 陈爱国. 融合多重嵌入的中文命名实体识别[J].中文信息学报,2023,37(4):63-71.(YIN C L, CHEN A G. Chinese Named entity recognition with integrated multiple embeddings[J]. Journal of Chinese information processing, 2023,37(4):63-71.)
[19]
孙红,王哲. 多粒度融合的命名实体识别[J]. 中文信息学报, 2023, 37(3): 123-134.(SUN H, WANG Z. Named entity recognition with multi-granularity fusion[J]. Journal of Chinese information processing, 2023, 37(3): 123-134.)
[20]
CHEN C, KONG F. Enhancing entity boundary detection for better Chinese named entity recognition[C]//Proceedings of the 59th annual meeting of the Association for Computational Linguistics and the 11th International joint conference on natural language processing. Stroudsburg: Association for Computational Linguistics, 2021: 20-25.
[21]
GU Y, QU X, WANG Z, et al. Delving deep into regularity: a simple but effective method for Chinese named entity recognition[J]. arxiv:2204.05544, 2022.
[22]
LAFFERTY J, MCCALLUM A, PEREIRA F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of International conference on machine learning. San Francisco: Morgan Kaufmann Publishers, 2002.
[23]
ZHOU P, SHI W, TIAN J, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]//Proceedings of the 54th annual meeting of the Association for Computational Linguistics. Berlin: Association for Computational Linguistics, 2016.
[24]
STRUBELL E, VERGA P, BELANGER D, et al. Fast and accurate entity recognition with iterated dilated convolutions: computer science[EB/OL]. [2024-07-26].https://arxiv.org/abs/1702.02098.
[25]
王东波, 刘畅, 朱子赫, 等. SikuBERT与SikuRoBERTa:面向数字人文的《四库全书》预训练模型构建及应用研究[J].图书馆论坛,2022,42(6):31-43.(WANG D B,LIU C,ZHU Z H. SikuBERT and SikuRoBERTa: research on the construction and application of pre-trained models for the Siku Quanshu (Complete Library of the Four Treasuries)in the Context of Digital Humanities[J]. Library tribune, 2022, 42(6):31-43.)
[26]
李正辉,廖光忠.基于多层次特征提取的中文医疗实体识别[J].计算机技术与发展,2023,33(9):119-125.(LI Z H,LIAO G Z. Chinese medical entity recognition based on multi-level feature extraction[J].Computer technology and development,2023,33(9):119-125.)
[27]
WU S, SONG X, FENG Z. MECT: multi-metadata embedding based cross-transformer for Chinese named entity recogtion[EB/OL]. https://aclanthology.org/2021.acl-long.121.pdf.
[28]
HU J, SHEN Y, LIU Y, et al. Hero-gang neural model for named entity recognition[C]//Proceedings of the 2022 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies. Seattle: Association for Computational Linguistics, 2022: 1924-1936.

孟佳娜:设计研究方案,修改论文;

许英傲:提出研究思路,撰写论文;

赵丹丹:采集、清洗和分析数据;

李丰毅:设计实验,处理数据;

赵 迪:修订论文与定稿。

Funding

Humanities and Social Sciences Research Planning Fund project titled “The Research on the Internet Smart Dissemination of Chinese Culture Based on Knowledge Graphs”(23YJA860010)
Fundamental Research Funds for the Central Universities project titled “Research on Sentiment Analysis Based on Large Models and Knowledge-Driven Approaches”(140250)
PDF(3081 KB)

Accesses

Citation

Detail

Sections
Recommended

/