基于引文网络和RAG的新兴主题识别和扩散研究

唐昕宇, 陈伟

知识管理论坛 ›› 2026, Vol. 11 ›› Issue (1) : 76-88.

PDF(4101 KB)
PDF(4101 KB)
知识管理论坛 ›› 2026, Vol. 11 ›› Issue (1) : 76-88. DOI: 10.13266/j.issn.2095-5472.2026.008  CSTR: 32306.14.CN11-6036.2026.008
研究论文

基于引文网络和RAG的新兴主题识别和扩散研究

作者信息 +

Research on Identification and Diffusion of Emerging Topics Based on Citation Network and RAG

Author information +
文章历史 +

摘要

【目的/意义】 基于传统主题模型与大语言模型结合以及基于RAG的大语言模型设计主题建模方法,分析新兴主题和非新兴主题的扩散情况,为后续新兴主题的识别和扩散提供参考。 【方法/过程】 首先,通过传统主题模型和大语言模型结合的方法对专利进行主题挖掘,根据新颖性、连续性、增长度和影响力识别新兴主题;其次,利用RAG微调的大语言模型评分,确定最佳主题建模方法;最后,构建引文网络计算扩散速度、广度、强度、深度和地域,对新兴主题和非新兴主题扩散指标的时序变化情况进行分析。 【结果/结论】 以核聚变为实证领域,研究发现基于BERTopic+Doubao的主题建模方法效果最佳;同时新兴主题扩散呈现波动上升的趋势,非新兴主题则呈现先稳步上升后小幅下降的趋势。本文提出的传统主题模型和大语言模型结合的方法能够有效识别新兴主题,RAG微调的大语言模型优化了结果的评估效果,且丰富了主题扩散测度视角。

Abstract

[Purpose/Significance] Based on the combination of traditional topic models and large language models, as well as RAG-based large language model, this study designs the topic modeling method, analyzes the diffusion of emerging topics and non-emerging topics, so as to provide a reference for the subsequent identification and diffusion of emerging topics. [Methods/Process] First of all, the topic mining of patents was carried out through the combination of traditional topic models and large language models, and the emerging topics were identified based on novelty, continuity, growth and influence. Secondly, the best topic modeling method was determined by using the large language model score fine-tuned by RAG. Finally, a citation network was constructed to calculate the diffusion speed, breadth, intensity, depth, and region, and analyze the temporal changes of the diffusion indicators of emerging topics and non-emerging topics. [Result/Conclusion] Taking nuclear fusion as the empirical field, the research finds that the topic modeling method based on BERTopic+Doubao has the best effect. At the same time, the diffusion of emerging themes mostly shows a fluctuating upward trend, while non-emerging themes show a steady upward trend first and then a slight downward trend. The method of combining the traditional topic model and the large language model proposed in this paper can effectively identify emerging topics. The RAG fine-tuning large language model optimizes the evaluation effect of the results, and enriches the perspective of topic diffusion measurement.

关键词

检索增强生成 / 大语言模型 / 新兴主题 / 技术扩散 / 引文网络

Key words

retrieval-augmented generation / large language model / emerging themes / technology diffusion / citation network

引用本文

导出引用
唐昕宇 , 陈伟. 基于引文网络和RAG的新兴主题识别和扩散研究[J]. 知识管理论坛. 2026, 11(1): 76-88 https://doi.org/10.13266/j.issn.2095-5472.2026.008
Tang Xinyu , Chen Wei. Research on Identification and Diffusion of Emerging Topics Based on Citation Network and RAG[J]. Knowledge Management Forum. 2026, 11(1): 76-88 https://doi.org/10.13266/j.issn.2095-5472.2026.008
中图分类号: G250.2   

参考文献

[1]
何多魁, 唐中君, 陈倩倩, 等. 微调大语言模型驱动的短文本动态主题建模方法[J]. 数据分析与知识发现, 2025, 9(10): 99-119.
HE D K, TANG Z J, CHEN Q Q, et al. Dynamic topic modelling approach of short text driven by fine-tuned large language model[J]. Data analysis and knowledge discovery, 2025, 9(10): 99-119.
[2]
邱均平, 胡博, 徐中阳, 等. 基于DTM模型的国内外话语权研究主题挖掘及比较分析[J]. 情报理论与实践, 2023, 46(2): 24-34.
QIU J P, HU B, XU Z Y, et al. Topic mining and comparative analysis of discourse power research in China and overseas based on DTM model[J]. Information studies: theory & application, 2023, 46(2): 24-34.
[3]
阮光册, 周萌葳. 基于Sentence-BERT的专利技术主题聚类研究——以人工智能领域为例[J]. 情报杂志, 2024, 43(02): 110-117.
RUAN G C, ZHOU M W. Research on patent technology subject clustering based on Sentence-BERT: taking the field of artificial intelligence as an example[J]. Journal of intelligence, 2024, 43(02): 110-117.
[4]
WANG H, PRAKASH N, HOANG N K, et al. Prompting large language models for topic modeling[C]// 2023 IEEE international conference on big data (BigData). Sorrento: IEEE, 2023: 1236-1241.
[5]
陈亮, 张静, 张海超, 等. 层次主题模型在技术演化分析上的应用研究[J]. 图书情报工作, 2017, 61(5): 103-108.
CHEN L, ZHANG J, ZHANG H C, et al. Application of hierarchical topic model on technological evolution analysis [J]. Library and information service, 2017, 61(5): 103-108.
[6]
唐嘉, 庞大崴, 刘书铭, 等. 大语言模型微调训练与检索增强生成技术在油气企业制度问答应用中的效果对比研究[J]. 数字通信世界, 2024(11): 104-106.
TANG J, PANG D W, LIU S M, et al. A Comparative study on the effects of fine-tuning training and retrieval enhancement generation technology for large language models in the application of institutional question answering in oil and gas enterprises[J]. Digital communication world, 2024(11): 104-106.
[7]
吴文旷, 周相广, 高国忠, 等. 基于检索增强生成和微调技术的大语言模型在勘探开发行业的应用初探[C]//西安石油大学, 陕西省石油学会. 2024油气田勘探与开发国际会议论文集Ⅰ. 中国石油勘探开发研究院信息中心, 长江大学地球物理与石油资源学院, 2024: 726-733.
WU W K, ZHOU X G, GAO G Z, et al. A preliminary study on the application of large language models based on retrieval enhancement generation and fine-tuning technology in the exploration and development industry [C]// Xi'an Shiyou University, Shaanxi Petroleum Society. Proceedings of the 2024 international conference on oil and gas field exploration and development I. Xi'an: Information Center of Research Institute of Petroleum Exploration and Development, China National Petroleum Corporation School of Geophysics and Petroleum Resources, Yangtze University, 2024: 726-733.
[8]
ROGERS E M. Diffusion of innovations[M]. 3rd ed. New York: The Free Press, 1983
[9]
何郁冰, 林欣慧. 基于复杂网络演化博弈的颠覆性技术扩散研究[J]. 软科学, 2024, 38(6): 28-36.
HE Y B, LIN X H. Research on disruptive technology diffusion based on complex network evolutionary game[J]. Soft science, 2024, 38(6): 28-36.
[10]
段尧清, 尚婷, 周密. 我国政务大数据政策扩散特征与主题分析[J]. 图书情报工作, 2020, 64(13): 133-139.
DUAN Y Q, SHANG T, ZHOU M. Analysis on the characteristics and subjects of China's government big data policy diffusion[J]. Library and information service, 2020, 64(13): 133-139.
[11]
王曰芬, 王柳虹, 巴志超, 等. 政府科技新闻中科技成果转化的主题识别与时空扩散分析[J]. 情报学报, 2023, 42(8): 939-951.
WANG Y F, WANG L H, BA Z C, et al. Spatiotemporal diffusion analysis of achievement transformation topics in government science and technology news[J]. Journal of the China Society for Scientific and Technical Information, 2023, 42(8): 939-951.
[12]
张涛, 张博雅, 马海群. 我国央地政府数据安全政策扩散特征及主题转移研究[J]. 情报理论与实践, 2025, 48(5): 118-127, 137.
ZHANG T, ZHANG B Y, MA H Q. Research on the diffusion characteristics and topic transfer of data security policies of China's central and local governments [J]. Information studies: theory & application, 2025, 48(5): 118-127, 137.
[13]
李冰, 丁堃, 孙晓玲, 等. 科学论文向技术领域扩散的扩散速度与扩散效果研究[J]. 情报理论与实践, 2024, 47(7): 35-47.
LI B, DING K, SUN X L, et al. Research on the diffusion speed and diffusion effects of scientific papers into the technological domain[J]. Information studies: theory & application, 2024, 47(7): 35-47.
[14]
王丽, 刘细文. 基于专利数据的技术主题扩散量化研究与实现[J]. 数据分析与知识发现, 2022, 6(6): 1-10.
WANG L, LIU X W. Measuring diffusion of technology topics with patent data[J]. Data analysis and knowledge discovery, 2022, 6(6): 1-10.
[15]
汪大锟, 化柏林. 基于BERTopic的新兴技术主题识别研究[J]. 科技情报研究, 2025, 7(1): 131-140.
WANG D K, HUA B L. Research on emerging technology topic identification based on BERTopic[J]. Scientific information research, 2025, 7(1): 131-140.
[16]
张凯, 吕璐成, 韩涛, 等. “论文—专利”关联视角下的新兴技术识别研究[J]. 情报理论与实践, 2024, 47(09): 183-191.
ZHANG K, LV L C, HAN T, et al. Research on emerging technology identification from the perspective of “paper-patent” correlation [J]. Information studies: theory & application, 2024, 47(09): 183-191.
[17]
曹琨, 吴新年, 靳军宝, 等. 基于共词和Node2Vec表示学习的新兴技术识别方法[J]. 数据分析与知识发现, 2023, 7(9): 89-99.
CAO K, WU X N, JIN J B, et al. Identification of emerging technology based on co-words and Node2Vec representation learning [J]. Data analysis and knowledge discovery, 2023, 7(9): 89-99.
[18]
陈稳, 陈伟. 科学与技术对比视角下的前沿主题识别与演化分析[J]. 情报杂志, 2022, 41(1): 67-73, 163.
CHEN W, CHEN W. The identification and evolution of research frontiers from comparison of science and technology [J]. Journal of intelligence, 2022, 41(1): 67-73, 163.
[19]
贵淑婷, 彭爱东. 基于专利引文网络的技术扩散速度研究[J]. 情报理论与实践, 2016, 39(5): 40-45.
GUI S T, PENG A D. Research on the speed of technology diffusion based on patent citation network[J]. Information studies: theory & application, 2016, 39(5): 40-45.
[20]
ALBUQUERQUE E D M E, BRITTO J N D P, RIBEIRO L C, et al. Patent citations, knowledge flows, and catching-up: evidences of different national experiences for the period 1982-2006[J]. Science and public policy, 2020(7): 1-14.
[21]
侯剑华, 杨秀财, 姚海玥. 伯乐型专利的识别及其特征研究[J]. 图书情报工作, 2024, 68(6): 104-118.
HOU J H, YANG X C, YAO H Y. Identification and characterization of bó lè patents [J]. Library and information service, 2024, 68(6): 104-118.
[22]
陈祥, 冯佳, 穆晓敏, 等. 技术知识扩散视角下核心专利识别方法研究[J]. 情报理论与实践, 2022, 45(10): 132-138.
CHEN X, FENG J, MU X M, et al. Study of identification of core patent in the perspective of diffusion of technology knowledge [J]. Information studies: theory & application, 2022, 45(10): 132-138.
[23]
阎伟华, 蔡传兵, 周迪帆. 基于第二代高温超导带材的磁体研究进展与挑战[J]. 物理, 2019, 48(11): 733-748.
YAN W H, CAI C B, ZHOU D F. Progress and challenges in the development of magnets based on second-generation high-temperature superconducting tapes[J]. Physics, 2019, 48(11): 733-748.
[24]
李洁, 刘宜平, 史越, 等. 标准化助力第二代高温超导带材产业化[J]. 中国标准化, 2021(S1): 87-93.
LI J, LIU Y P, SHI Y, et al. Standardization supports the industrialization of the second-generation high temperature superconducting taps[J]. China standardization, 2021(S1): 87-93.
[25]
唐昕宇. 核聚变技术领域1955-2024年专利数据集[DS/OL]. Science Data Bank. 知识管理论坛. [2025-12-16].
DOI:10.57760/sciencedb.j00074.00155. TANG X Y. Patent dataset in the field of nuclear fusion technology from 1955 to 2024[DS/OL]. Science Data Bank. Knowledge management forum [2025-12-16]. DOI:10.57760/sciencedb.j00074.00155 .

基金

中国科学院战略研究与决策支持系统建设专项课题“能源领域科技制高点研究”(GHJ-ZLZX-2025-46)

PDF(4101 KB)

Accesses

Citation

Detail

段落导航
相关文章

/