
基于共现关系的关键词层次结构构建研究
Research on the Construction of Keyword Hierarchy Relationship Based on Co-occurrence Relationship
[目的/意义]关键词作为应用最为广泛的文献知识单元,对于其语义关系的深入挖掘可为知识关联、资源推荐等工作提供底层支持。[方法/过程]基于关键词直接共现与间接共现关系对关键词之间的相关性进行挖掘,在此基础上对关键词的分布情况进行分析并结合关键词概念范围大小构建关键词间的层次结构。[结果/结论]以“知识图谱”为根节点,演示关键词层次结构构建步骤,研究表明,该方法具有一定的可行性和有效性,能够较好地构建关键词层次结构。
[Purpose/Significance] Keyword is the most widely used literature knowledge unit, and the in-depth mining of its semantic relationship can provide underlying support for knowledge association and resource recommendation. [Method/Process] Based on the relationship between the direct co-occurrence and indirect co-occurrence of keywords, the correlation between keywords was mined, and on this basis, the distribution of keywords was analyzed, and the hierarchical structure between keywords was constructed according to the size of the concept range of keywords. [Result/Conclusion] Taking "knowledge graph" as the root node, this paper demonstrates the steps of construction of keywords hierarchy. The research shows that the method is feasible and effective, and it can construct the hierarchical structure of keywords better.
keywords of scientific technological literature / keyword hierarchy / keyword characteristics
[1] |
PUTRA J W G, KHODRA M L. Automatic title generation in scientific articles for authorship assistance: a summarization approach[J]. Journal of ICT research and applications, 2017, 11(3): 253-267.
|
[2] |
罗威,谭玉珊.基于内容的科技文献大数据挖掘与应用[J].情报理论与实践,2021,44(6):154-157.
|
[3] |
胡昌平,陈果.科技论文关键词特征及其对共词分析的影响[J].情报学报,2014,33(1):23-32.
|
[4] |
韩普,王东波,朱恒民.基于复杂网络的汉语相似词挖掘和相似度计算研究[J].情报学报,2015,34(8):885-896.
|
[5] |
韩普,王东波,王子敏.词汇相似度计算和相似词挖掘研究进展[J].情报科学,2016,34(9):161-165.
|
[6] |
魏瑞斌,蒋倩雯,张瑞丽.基于文献共被引和共词分析的研究方法的比较研究——以共词分析和内容分析为例[J].情报杂志,2019,38(2):36-42,4.
|
[7] |
VARELAS G, VOUTSAKIS E, RAFTOPOULOU P, et al. Semantic similarity methods in wordnet and their application to information retrieval on the web[C]//Proceedings of the 7th annual ACM international workshop on Web information and data management. New York:ACM,2005: 10-16.
|
[8] |
田久乐,赵蔚.基于同义词词林的词语相似度计算方法[J].吉林大学学报(信息科学版),2010,28(6):602-608.
|
[9] |
王义, 王小林. 基于改进的义原关联度算法的词语相关度计算[J]. 情报学报, 2012, 31(12): 1271-1275.
|
[10] |
朱新华,马润聪,孙柳,等.基于知网与词林的词语语义相似度计算[J].中文信息学报,2016,30(4):29-36.
|
[11] |
MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[EB/OL].[2022-07-31].https://doi.org/10.48550/arXiv.1301.3781.
|
[12] |
PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations[EB/OL].[2022-07-31].https://doi.org/10.48550/arXiv.1802.05365.
|
[13] |
DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[EB/OL].[2022-07-31].https://doi.org/10.48550/arXiv.1810.04805.
|
[14] |
田星,郑瑾,张祖平.基于词向量的Jaccard相似度算法[J].计算机科学,2018,45(7):186-189.
|
[15] |
PONTES E L, HUET S, LINHARES A C, et al. Predicting the semantic textual similarity with siamese CNN and LSTM[EB/OL].[2022-07-31].https://doi.org/10.48550/arXiv.1810.10641.
|
[16] |
SANJEEV M M, RAMALINGAM B, TK S K. Realtime semantic similarity analysis of bulk outlook Emails using BERT[C]//2020 International Conference on advances in computing, communication & materials (ICACCM). Piscataway: IEEE, 2020: 89-94.
|
[17] |
闫强,张笑妍,周思敏.基于义原相似度的关键词抽取方法[J].数据分析与知识发现,2021,5(4):80-89.
|
[18] |
TIBELY G, POLLNER P, VICSEK T, et al. Extracting tag hierarchies[J]. PloS one, 2013, 8(12): e84133.
|
[19] |
LI S, SUN Y, SOERGEL D. A new method for automatically constructing domain-oriented term taxonomy based on weighted word co-occurrence analysis[J]. Scientometrics, 2015, 103(3): 1023-1042.
|
[20] |
熊回香, 王学东. 大众分类体系中标签概念空间的构建研究[J]. 情报学报, 2012, 31(9): 984-992.
|
[21] |
熊回香,叶佳鑫.基于同义词词林的社会化标签等级结构构建研究[J].情报杂志,2018,37(1):126-131.
|
[22] |
叶佳鑫,熊回香,杨滋荣,等.关键词词频及语义特征对科技文献聚类的影响研究[J].情报科学,2021,39(8):156-163.
|
[23] |
孙鸿飞,侯伟,周兰萍,等.近五年我国情报学研究方法应用的统计分析[J].情报科学,2014,32(4):77-84.
|
[24] |
学术点滴,文献计量. COOC一款用于文献计量和知识图谱绘制的新软件[EB/OL]. [2021-07-15].https://mp.weixin.qq.com/s/8RoKPLN6b1M5_jCk1J8UVg.
|
熊回香:论文指导
陈子薇:数据收集、论文撰写与修改
叶佳鑫:论文修改
/
〈 |
|
〉 |