Research on Unsupervised Patent Entity Extraction Method Assisted by Technology Classification Codes

Liang Chen, Weijiao Shang, Chi Yu, Lin Mou, Chunzi Xia, Chuan Ge

Knowledge Management Forum ›› 2024, Vol. 9 ›› Issue (4) : 422-436.

PDF(2609 KB)
PDF(2609 KB)
Knowledge Management Forum ›› 2024, Vol. 9 ›› Issue (4) : 422-436. DOI: 10.13266/j.issn.2095-5472.2024.031

Research on Unsupervised Patent Entity Extraction Method Assisted by Technology Classification Codes

Author information +
History +

Abstract

[Purpose/Significance] Unsupervised method of patent entity extraction is capable of addressing the issue of previous methods that are highly dependent on labeled resources, thus promoting the widespread of artificial intelligence technology in the intellectual property field and improving the ability of patent information service.[Method/Process] By combining the inherent technology classification codes of patent documents with topic modeling, this study proposed a new method that utilizes patent classification code to guide the topic allocation process in patent text, thus extracting entities without annotation dataset. [Result/Conclusion] To demonstrate the advantages of our method, the empirical analysis was conducted using a patent dataset from the field of thin-film magnetic heads in hard disk drives, along with the IPC technology classification system. The experimental results show that there is a significant difference in the performance of entity extraction for different levels of technology classifications. Moreover, the entity extraction performance based on the fifth-level IPC technology classification code is far superior to the conventional Subject-Action-Object (SAO) method.

Key words

entity extraction / topic model / patent mining / patent classification code

Cite this article

Download Citations
Liang Chen , Weijiao Shang , Chi Yu , et al . Research on Unsupervised Patent Entity Extraction Method Assisted by Technology Classification Codes[J]. Knowledge Management Forum. 2024, 9(4): 422-436 https://doi.org/10.13266/j.issn.2095-5472.2024.031

References

[1]
AKHONDI S A, KLENNER A G, TYRCHAN C, et al. Annotated chemical patent corpus: a gold standard for text mining[J]. Plos one, 2014, 9(9): 1-8.
[2]
PÉREZ-PÉREZ M, PÉREZ-RODRÍGUEZ G, VAZQUEZ M, et al. Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: the CEMP and GPRO patents tracks[EB/OL].[2024-07-22]. https://biocreative.bioinformatics.udel.edu/media/store/files/2017/BioCreative_V5_paper2.pdf.
[3]
CHEN L, XU S, ZHU L, et al. A deep learning based method for extracting semantic information from patent documents[J]. Scientometrics, 2020, 125(1): 289-312.
[4]
The Stanford Natural Language Processing Group. Stanford Named Entity Recognizer (NER)
[5]
英格索尔, 莫顿, 法里斯.驾驭文本:文本的发现、组织和处理[M].王斌,译.北京:电子工业出版社,2015.(INGERSOLL G S, MORTON T S, FARRIS A L. Taming text: how to find, organize and manipulate it[M].Shelter Island: Manning Publications.)
[6]
DEWULF S. Directed variation of properties for new or improved function product DNA: a base for connect and develop[J]. Procedia engineering, 2011(9):646-652.
[7]
PARK H, YOON J, KIM K. Identifying patent infringement using SAO based semantic technological similarities[J]. Scientometrics, 2012, 90(2):515-529.
[8]
YANG S Y, SOO V W. Extract conceptual graphs from plain texts in patent claims[J]. Engineering applications of artificial intelligence, 2012, 25(4): 874-887.
[9]
CHOI S, KANG D, LIM J, et al. A fact-oriented ontological approach to SAO-based function modeling of patents for implementing function-based technology database[J]. Expert system with application, 2012, 39(10):9129-9140.
[10]
薛驰, 邱清盈, 冯培恩,等. 机械产品专利作用结构知识提取方法研究[J]. 农业机械学报, 2013, 44(1):222-229.(XUE C, QIU Q Y, FENG P E, et al. Acquisition method for principle solution of mechanical patent[J]. Transactions of the Chinese Society for Agricultural Machinery,2013,44(1):222-229.)
[11]
BERGMANN I, BUTZKE D, WALTER L, et al. Evaluating the risk of patent infringement by means of semantic patent analysis: the case of DNA chips[J]. R&D management, 2008, 38(5):550-562.
[12]
YANG C, ZHU D, WANG X, et al. Requirement-oriented core technological components’ identification based on SAO analysis[J]. Scientometrics, 2017, 112(3):1229-1248.
[13]
MOEHRLE M G, WALTER L, GERITZ A, et al. Patent‐based inventor profiles as a basis for human resource decisions in research and development[J]. R&d management, 2005, 35(5):513-524.
[14]
GUO J, WANG X, LI Q, et al. Subject-action-object-based morphology analysis for determining the direction of technological change[J]. Technological forecasting and social change, 2016, 105:27-40.
[15]
AN J, KIM K, MORTARA L, et al. Deriving technology intelligence from patents: preposition-based semantic analysis[J]. Journal of informetrics, 2018, 12(1):217-236.
[16]
胡菊香,吕学强,刘秀磊,等.专利技术功效短语获取研究[J].科学技术与工程,2016,16(14):228-235.(HU J X, LV X Q, LIU X L, et al. Extracting technologies efficacy phrases of patent for research[J]. Science technology and engineering,2016,16(14):228-235.)
[17]
马建红,张明月,赵亚男.面向创新设计的专利知识抽取方法[J].计算机应用,2016,36(2):465-471.(MA J H, ZHANG M Y, ZHAO Y N. Patent knowledge extraction method for innovation design[J]. Journal of computer applications ,2016,36(2):465-471.)
[18]
YOON J, KO N, KIM J. A function-based knowledge base for technology intelligence[J].Industrial engineering & management systems, 2015, 14(1):73-87.
[19]
HOI S, PARK H, KANG D, et al. An SAO-based text mining approach to building a technology tree for technology planning[J].Expert system with application, 2012, 39(13):11443-11455.
[20]
王琰炎,王裴岩,蔡东风.一种用于专利实体的实体消歧方法[J].沈阳航空航天大学学报,2015,32(1):77-83.(WANG Y Y, WANG P Y, CAI D F. An entity disambiguation method for patent entity[J].Journal of Shenyang Aerospace University, 2015, 32(1): 77-83.)
[21]
WANG X, QIU P, ZHU D, et al. Identification of technology development trends based on subject-action-object analysis: the case of dye-sensitized solar cells[J].Technological forecasting and social change, 2015,98:24-46.
[22]
SAAD F. Named entity recognition for biomedical patent text using Bi-LSTM variants[C]//Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services. New York: ACM Press, 2019: 617-621.
[23]
ZHAI Z, NGUYEN D Q, AKHONDI S A, et al. Improving chemical named entity recognition in patents with contextualized word embeddings[J]. arXiv preprint, 2019, arXiv:1907.02679.
[24]
SABER A, ALEXANDER G K, CHRISTIAN T, et al. Annotated chemical patent corpus: a gold standard for text mining[J]. Plos one, 2014, 9(9):e107477.
[25]
SABER A, HINNERK R, MARKUS S, et al. Automatic identification of relevant chemical compounds from patents[EB/OL]. [2024-06-30]. https://academic.oup.com/database/article-pdf/doi/10.1093/database/baz001/27636778/baz001.pdf.
[26]
邢晓昭,苑朋彬,陈亮,等.面向技术识别的专利实体抽取研究——以类脑智能领域为例[J].情报杂志, 2024, 43(6):126-133,144.(XING X Z,YUAN P B, CHEN L, et al. Research on patent entity extraction for technology recognition: a case study of brain-inspired intelligence[J].Journal of intelligence, 2024, 43(6):126-133,144.)
[27]
ZHANG H, ZHANG C, WANG Y, et al. Revealing the technology development of natural language processing: a scientific entity-centric perspective[J]. Information processing and management, 2024, 61(1): 103574.
[28]
章成志, 谢雨欣, 张恒, 等.学术文献全文内容中的方法实体细粒度抽取及演化分析研究[J].情报学报,2023,42(8):952-966.(ZHANG C Z, XIE Y X, ZHANG H, et al. Extraction and evolution analysis of fine-grained method entities from full text of academic articles[J]. Journal of the China Society for Scientific and Technical Information, 2023, 42(8):952-966.)
[29]
白如江,陈启明,张玉洁,等.基于ChatGPT+Prompt的专利技术功效实体自动生成研究[J].数据分析与知识发现,2024,8(4):14-25.( BAI R J, CHEN Q M, ZHANG Y J, et al. Generating effectiveness entities of patent technology based on ChatGPT+Prompt[J]. Data analysis and knowledge discovery, 2024, 8(4): 14-25.)
[30]
原之安,彭甫镕,谷波,等.面向标注数据稀缺专利文献的科技实体抽取[J].郑州大学学报(理学版),2021,53(4):61-68.(YUAN Z A, PENG F R, GU B, et al. Technology entity extraction of patent literature with limited annotated data[J]. Journal of Zhengzhou University(natural science edition), 2021, 53(4): 61-68.)
[31]
陈亮. 面向专利分析的Patent Classification LDA模型[J]. 情报学报, 2016, 35(8):864-874. (CHEN L. Patent classification LDA: topic model for patent analysis[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(8):864-874.)
[32]
JELINEK F, MERCER R L, BAHL L R, et al. Perplexity: a measure of the difficulty of speech recognition tasks[J]. The Journal of the Acoustical Society of America, 1977, 62(S1): S63-S63.

陈 亮:论文构思与方法设计,文献调研,代码编写,实证分析和论文撰写;

尚玮姣:论文思路梳理,实验数据整理与分析,论文撰写;

余 池:文献调研,材料整理和论文撰写;

牟 琳:文献调研,专利数据集整理和统计,论文撰写;

夏春姊:文章审阅,提出修改意见及论文修改;

葛 川:实体抽取相关算法的调研和梳理。

Funding

Shanxi Province Science and Technology Cooperation and Communication Special Project titled “Research and Development of Shanxi Province Research Project Similarity Monitoring Technology Based on Big Data and its Application Demonstration”(202204041101034)
PDF(2609 KB)

Accesses

Citation

Detail

Sections
Recommended

/