基于句法和语义关联的科技文献“问题—方法”联合抽取模型

刘勘, 李冶, 石锴文

知识管理论坛 ›› 2024, Vol. 9 ›› Issue (4) : 353-366.

PDF(1721 KB)
PDF(1721 KB)
知识管理论坛 ›› 2024, Vol. 9 ›› Issue (4) : 353-366. DOI: 10.13266/j.issn.2095-5472.2024.026
研究论文

基于句法和语义关联的科技文献“问题—方法”联合抽取模型

作者信息 +

"Problem-method" Joint Extraction Model in Scientific Literature Based on Syntax and Semantic Association

Author information +
文章历史 +

摘要

[目的/意义] 发现海量科技文献中的研究问题及其对应的研究方法,有助于挖掘科学研究中的热点,促进技术方法的创新,探索知识的演化传播规律。[方法/过程] 提出一种融合句法结构和语义关联信息的科技文献“问题—方法”联合抽取模型,模型采用编码器—解码器结构。在编码层,以科技文献的摘要文本为对象,从中抽取SAO三元组句法结构用以表达研究问题和研究方法的关系(即:研究方法—作用于—研究问题),基于SAO三元组构造语义关联图并利用图注意力网络进行编码,再与摘要文本编码融合作为解码器的输入特征;在解码层,通过指针网络基于先抽取的研究方法再抽取研究问题,实现“问题—方法”的联合抽取。[结果/结论] 实验结果表明,模型在测评指标及人工测评中均能取得较好的效果,能够提升从科技文献中抽取核心问题和核心方法的能力。

Abstract

[Purpose/Significance] Discovering research questions and methods from a vast corpus of scientific literature contributes to uncovering research trends, promoting innovation technical approaches, and exploring patterns of knowledge evolution and dissemination in scientific research. [Method/Process] This paper proposed an integrated model for the joint extraction of “Problem-Method” pairs in scientific literature, combining syntactic structural information and semantic relationships. The model employed an encoder-decoder architecture. At the encoding stage, we focused on the abstract text of scientific literature, extracting Subject-Action-Object (SAO) triplets to represent the relationship between research questions and research methods (i.e., research method - acts on - research question). We constructed a semantic association graph based on SAO triplets and utilized Graph Attention Neural Networks (GAT) for encoding. The resulting encoding, combined with the abstract text, serves as input features for the decoder. At the decoding stage, a pointer network was used to extract research questions based on previously extracted research methods, enabling the joint extraction of “Problem-Method” pairs. [Results/Conclusion] Experiments indicate that our model performs well in terms of evaluation metrics and human assessment, enhancing the ability to extract core research questions and methods from the scientific literature.

关键词

“问题—方法”抽取 / GAT / SAO三元组

Key words

problem-method extraction / GAT / SAO triples

引用本文

导出引用
刘勘 , 李冶 , 石锴文. 基于句法和语义关联的科技文献“问题—方法”联合抽取模型[J]. 知识管理论坛. 2024, 9(4): 353-366 https://doi.org/10.13266/j.issn.2095-5472.2024.026
Kan Liu , Ye Li , Kaiwen Shi. "Problem-method" Joint Extraction Model in Scientific Literature Based on Syntax and Semantic Association[J]. Knowledge Management Forum. 2024, 9(4): 353-366 https://doi.org/10.13266/j.issn.2095-5472.2024.026
中图分类号: G255;TP391.1   

参考文献

[1]
MISHRA R B, JIANG H. Classification of problem and solution strings in scientific texts: evaluation of the effectiveness of machine learning classifiers and deep neural networks[J]. Applied sciences, 2021, 11(21): 9997.
[2]
王露,乐小虬.基于句法依赖增强的主题—问题实例识别方法研究[J].数据分析与知识发现,2022,6(12):13-22. (WANG L, LE X Q. Identifying topic-problem instances based on syntactic dependency enhancement[J].Data analysis and knowledge discovery,2022,6(12):13-22.)
[3]
章成志,张颖怡.基于学术论文全文的研究方法实体自动识别研究[J].情报学报,2020,39(6):589-600. (ZHANG C Z, ZHANG Y Y. Automatic recognition of research methods from the full-text of academic articles[J]. Journal of the China Society for Scientific and Technical Information,2020,39(6):589-600.)
[4]
鞠晓蓓,李秀霞,袁炜皓.基于问题、方法贡献度的学术期刊热点主题分析——以《情报学报》为例[J].图书情报导刊,2022,7(11):49-57. (JU X B, LI X X, YUAN W H. Analysis of hot topics in academic journals based on subject-method contribution: taking Journal of the China Society for Scientific and Technical Information as an example[J]. Journal of library and information science,2022,7(11):49-57.)
[5]
张吉玉,张均胜.考虑时序的单篇科技文献新颖性评估方法[J].图书情报工作,2022,66(17):93-105. (ZHANG J Y, ZHANG J S. Novelty evaluation method of single scientific and technical literature considering time series[J]. Library and information service,2022,66(17):93-105.)
[6]
罗卓然,陆伟,蔡乐,等.学术文本词汇功能识别——在论文新颖性度量上的应用[J].情报学报,2022,41(7):720-732. (LUO Z R, LU W, CAI L, et al. Application of lexical functions in novelty measurement of academic papers[J]. Journal of the China Society for Scientific and Technical Information,2022,41(7):720-732.)
[7]
钱佳佳,罗卓然,陆伟.基于问题—方法组合的科技论文新颖性度量与创新类型识别[J].图书情报工作,2021,65(14):82-89. (QIAN J J, LUO Z R, LU W. Novelty measurement and innovation type identification of scientific literature based on question-method combination[J].Library and information service,2021,65(14):82-89.)
[8]
唐晓波,向莉丽,牟昊.基于研究问题与研究方法贡献的论文学术价值早期识别方法[J].情报科学,2022,40(9):3-11,19. (TANG X B, XIANG L L, MOU H. Early identification method of academic value of papers based on research question and research method contribution[J]. Journal of the China Society for Scientific and Technical Information,2022,40(9):3-11,19.)
[9]
陈果,彭家彬,肖璐.基于“问题—方法”知识抽取的科研领域知识演化研究:以人工智能为例[J].情报理论与实践,2022,45(6):32-38. (CHEN G, PENG J B, XIAO L. Knowledge evolution of scientific research domains based on problem-solution knowledge extraction: a case study of artificial intelligence[J]. Information studies: theory & application,2022,45(6):32-38.)
[10]
WANG Y, ZHANG C, LI K. A review on method entities in the academic literature: extraction, evaluation, and application[J]. Scientometrics, 2022, 127(5): 2479–2520.
[11]
赵志耘,刘耀,朱礼军,等.复杂信息环境下知识组织和再利用模式与方法研究[J].情报学报,2022,41(12):1266-1279. (ZHAO Z Y, LIU Y, ZHU L J, et al. Research on patterns and methods for knowledge construction and reuse in a complex information environment[J]. Journal of the China Society for Scientific and Technical Information,2022,41(12):1266-1279.)
[12]
JAIN S, VAN Z M, HAJISHIRZI H, et al. Scirex: a challenge dataset for document-level information extraction[J]. ArXiv preprint, 2020, arXiv:2005.00512.
[13]
HONG Z, WARD L, CHARD K, et al. Challenges and advances in information extraction from scientific literature: a review[J]. Journal of the minerals, metals & materials society, 2021, 73(11):3383-3400.
[14]
CHEN G, PENG J, XU T, et al. Extracting entity relations for “problem-solving” knowledge graph of scientific domains using word analogy[J]. Aslib journal of information management, 2023, 75(3): 481-499.
[15]
HOU L, ZHANG J, WU O, et al. Method and dataset entity mining in scientific literature: a CNN+ BiLSTM model with self-attention[J]. Knowledge-based systems, 2022, 235(1): 107621.
[16]
FÄRBER M, ALBERS A, SCHÜBER F. Identifying used methods and datasets in scientific publications[C]//Proceedings of the AAAI-21 workshop on scientific document understanding. Aachen: AAAI, 2021.
[17]
HOU Y, JOCHIM C, GLEIZE M, et al. Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction[J]. ArXiv preprint, 2019, arXiv:1906.09317.
[18]
KARDAS M, CZAPLA P, STENETORP P, et al. Axcell: automatic extraction of results from machine learning papers[J]. ArXiv preprint, 2020, arXiv:2004.14356.
[19]
SASAKI H, YAMAMOTO S, AGCHBAYAR A, et al. Extracting problem linkages to improve knowledge exchange between science and technology domains using an attention-based language model[J]. Engineering, technology & applied science research, 2020, 10(4): 5903-5913.
[20]
王路,李寿山.基于变分自编码器的问题识别方法[J].郑州大学学报(理学版),2019,51(3):79-84. (WANG L, LI S S. Question detection method based on variational auto-encoder[J] Journal of Zhengzhou University(natural science edition),2019,51(3):79-84.)
[21]
张颖怡,章成志.基于学术论文全文的研究方法句自动抽取研究[J].情报学报,2020,39(6):640-650. (ZHANG Y Y, ZHANG C Z. Methodological and automatic sentence extraction from academic articles full-text[J]. Journal of the China Society for Scientific and Technical Information,2020,39(6):640-650.)
[22]
李贺,杜杏叶.基于知识元的学术论文内容创新性智能化评价研究[J].图书情报工作,2020,64(1):93-104. (LI H, DU X Y. Research on intelligent evaluation for the content innovation of acade-mic papers[J].Library and information service,2020,64(1):93-104.)
[23]
王艳艳,张均胜,乔晓东,等.基于问题—方法矩阵的文献新颖性评估方法[J].情报理论与实践,2021,44(2):90-95. (WANG Y Y, ZHANG J S, QIAO X D, et al. Evaluating novelty of scientific literature based on question-method matrix[J]. Information studies: theory & application,2021,44(2):90-95.)
[24]
徐珍珍,张均胜,刘文斌.科技文献中技术关联自动发现方法研究[J].图书情报工作,2021,65(20):113-122. (XU Z Z, ZHANG J S, LIU W B. Automatically discovering associations among technologies in scientific literature[J]. Library and information service,2021,65(20):113-122.)
[25]
张吉玉,张均胜,乔晓东.辅助新颖性评估的科技论文评述画像构建方法[J].情报理论与实践,2023,46(1):159-167. (ZHANG J Y, ZHANG J S, QIAO X D. Constructing review profile of scientific article for novelty evaluation assistance[J]. Information studies: theory & application,2023,46(1):159-167.)
[26]
HEFFERNAN K, TEUFEL S. Identifying problems and solutions in scientific text[J]. Scientometrics, 2018, 116(2): 1367-1382.
[27]
GARECHANA G, RÍO-BELVER R, ZARRABEITIA E, et al. TeknoAssistant: a domain specific tech mining approach for technical problem-solving support[J]. Scientometrics, 2022, 127(9): 1-15.
[28]
PUTRA J W G, KHODRA M L. Automatic title generation in scientific articles for authorship assistance: a summarization approach[J]. Journal of ICT research and applications, 2017, 11(3): 253-267.
[29]
余丽,钱力,付常雷,等.基于深度学习的文本中细粒度知识元抽取方法研究[J].数据分析与知识发现,2019,3(1):38-45. (YU L, QIAN L, FU C L, et al. Extracting fine-grained knowledge units from texts with deep learning[J]. Data analysis and knowledge discovery,2019,3(1):38-45.)
[30]
陆伟,李鹏程,张国标,等.学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J].情报学报,2020,39(12):1320-1329. (LU W, LI P C, ZHANG G B, et al. Recognition of lexical functions in academic texts: automatic classification of keywords based on Bert vectorization[J]. Journal of the China Society for Scientific and Technical Information,2020,39(12):1320-1329.)
[31]
程齐凯,李鹏程,张国标,等.学术文本词汇功能识别——基于标题生成策略和注意力机制的问题方法抽取[J].情报学报,2021,40(1):43-52. (CHENG Q K, LI P C, ZHANG G B, et al. Recognition of lexical functions in academic texts: problem method extraction based on title generation strategy and attention mechanism[J]. Journal of the China Society for Scientific and Technical Information,2021,40(1):43-52.)
[32]
张颖怡,章成志, HE D Q .学术论文中问题与方法识别及其关系抽取研究综述[J].图书情报工作,2022,66(12):125-138. (ZHANG Y Y, ZHANG C Z, HE D Q. A review of problem and method recognition and relation extraction in academic papers[J].Library and information service,2022,66(12):125-138.)
[33]
刘春江,刘自强,方曙.基于SAO的技术主题创新演化路径识别及其可视化研究[J].情报学报,2023,42(2):164-175. (LIU C J, LIU Z Q, FANG S. Evolution path identification and visualization of technological innovation based on SAO[J]. Journal of the China Society for Scientific and Technical Information,2023,42(2):164-175.)
[34]
LEWIS M, LIU Y, GOYAL N, et al. Bart: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[J]. ArXiv preprint, 2019, arXiv:1910.13461.
[35]
VELIKOVI P , CUCURULL G , CASANOVA A ,et al. Graph attention networks[J]. ArXiv preprint, 2017, arXiv: 1710.10903.
[36]
VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[C]//Proceedings of the 28th international conference on neural information processing systems. Cambridge: MIT Press, 2015, 2: 2692–2700.
[37]
索传军,葛倩,魏长青.基于论题视角的图情中文期刊论文关键词标注探析——以“基于”类论文为例[J].图书情报工作,2022,66(12):117-124. (SUO C J, GE Q, WEI C Q. An exploration of keyword labeling for Chinese journal papers in library and information science based on the perspective of paper titles: taking “based” papers as the example[J]. Library and information service,2022,66(12):117-124.)
[38]
RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. The journal of machine learning research, 2020, 21(1): 5485-5551.
[39]
BROWN T, MANN B, RYDER N, et al. Language models are few-shot learners[J]. Advances in neural information processing systems, 2020, 33(4): 1877-1901.

作者贡献说明/Author contributions:

刘 勘:提出研究问题,确定论文最终版本;

李 冶:设计研究方案,撰写论文;

石锴文:负责研究方案修正及编程实现。

基金

国家自然科学基金项目“面向学术创新的卓越学者知识体系构建”(72174156)

PDF(1721 KB)

Accesses

Citation

Detail

段落导航
相关文章

/