PDF(4101 KB)
Research on Identification and Diffusion of Emerging Topics Based on Citation Network and RAG
Tang Xinyu, Chen Wei
Knowledge Management Forum ›› 2026, Vol. 11 ›› Issue (1) : 76-88.
PDF(4101 KB)
PDF(4101 KB)
Research on Identification and Diffusion of Emerging Topics Based on Citation Network and RAG
[Purpose/Significance] Based on the combination of traditional topic models and large language models, as well as RAG-based large language model, this study designs the topic modeling method, analyzes the diffusion of emerging topics and non-emerging topics, so as to provide a reference for the subsequent identification and diffusion of emerging topics. [Methods/Process] First of all, the topic mining of patents was carried out through the combination of traditional topic models and large language models, and the emerging topics were identified based on novelty, continuity, growth and influence. Secondly, the best topic modeling method was determined by using the large language model score fine-tuned by RAG. Finally, a citation network was constructed to calculate the diffusion speed, breadth, intensity, depth, and region, and analyze the temporal changes of the diffusion indicators of emerging topics and non-emerging topics. [Result/Conclusion] Taking nuclear fusion as the empirical field, the research finds that the topic modeling method based on BERTopic+Doubao has the best effect. At the same time, the diffusion of emerging themes mostly shows a fluctuating upward trend, while non-emerging themes show a steady upward trend first and then a slight downward trend. The method of combining the traditional topic model and the large language model proposed in this paper can effectively identify emerging topics. The RAG fine-tuning large language model optimizes the evaluation effect of the results, and enriches the perspective of topic diffusion measurement.
retrieval-augmented generation / large language model / emerging themes / technology diffusion / citation network
| [1] |
何多魁, 唐中君, 陈倩倩, 等. 微调大语言模型驱动的短文本动态主题建模方法[J]. 数据分析与知识发现, 2025, 9(10): 99-119.
|
| [2] |
邱均平, 胡博, 徐中阳, 等. 基于DTM模型的国内外话语权研究主题挖掘及比较分析[J]. 情报理论与实践, 2023, 46(2): 24-34.
|
| [3] |
阮光册, 周萌葳. 基于Sentence-BERT的专利技术主题聚类研究——以人工智能领域为例[J]. 情报杂志, 2024, 43(02): 110-117.
|
| [4] |
|
| [5] |
陈亮, 张静, 张海超, 等. 层次主题模型在技术演化分析上的应用研究[J]. 图书情报工作, 2017, 61(5): 103-108.
|
| [6] |
唐嘉, 庞大崴, 刘书铭, 等. 大语言模型微调训练与检索增强生成技术在油气企业制度问答应用中的效果对比研究[J]. 数字通信世界, 2024(11): 104-106.
|
| [7] |
吴文旷, 周相广, 高国忠, 等. 基于检索增强生成和微调技术的大语言模型在勘探开发行业的应用初探[C]//西安石油大学, 陕西省石油学会. 2024油气田勘探与开发国际会议论文集Ⅰ. 中国石油勘探开发研究院信息中心, 长江大学地球物理与石油资源学院, 2024: 726-733.
|
| [8] |
|
| [9] |
何郁冰, 林欣慧. 基于复杂网络演化博弈的颠覆性技术扩散研究[J]. 软科学, 2024, 38(6): 28-36.
|
| [10] |
段尧清, 尚婷, 周密. 我国政务大数据政策扩散特征与主题分析[J]. 图书情报工作, 2020, 64(13): 133-139.
|
| [11] |
王曰芬, 王柳虹, 巴志超, 等. 政府科技新闻中科技成果转化的主题识别与时空扩散分析[J]. 情报学报, 2023, 42(8): 939-951.
|
| [12] |
张涛, 张博雅, 马海群. 我国央地政府数据安全政策扩散特征及主题转移研究[J]. 情报理论与实践, 2025, 48(5): 118-127, 137.
|
| [13] |
李冰, 丁堃, 孙晓玲, 等. 科学论文向技术领域扩散的扩散速度与扩散效果研究[J]. 情报理论与实践, 2024, 47(7): 35-47.
|
| [14] |
王丽, 刘细文. 基于专利数据的技术主题扩散量化研究与实现[J]. 数据分析与知识发现, 2022, 6(6): 1-10.
|
| [15] |
汪大锟, 化柏林. 基于BERTopic的新兴技术主题识别研究[J]. 科技情报研究, 2025, 7(1): 131-140.
|
| [16] |
张凯, 吕璐成, 韩涛, 等. “论文—专利”关联视角下的新兴技术识别研究[J]. 情报理论与实践, 2024, 47(09): 183-191.
|
| [17] |
曹琨, 吴新年, 靳军宝, 等. 基于共词和Node2Vec表示学习的新兴技术识别方法[J]. 数据分析与知识发现, 2023, 7(9): 89-99.
|
| [18] |
陈稳, 陈伟. 科学与技术对比视角下的前沿主题识别与演化分析[J]. 情报杂志, 2022, 41(1): 67-73, 163.
|
| [19] |
贵淑婷, 彭爱东. 基于专利引文网络的技术扩散速度研究[J]. 情报理论与实践, 2016, 39(5): 40-45.
|
| [20] |
|
| [21] |
侯剑华, 杨秀财, 姚海玥. 伯乐型专利的识别及其特征研究[J]. 图书情报工作, 2024, 68(6): 104-118.
|
| [22] |
陈祥, 冯佳, 穆晓敏, 等. 技术知识扩散视角下核心专利识别方法研究[J]. 情报理论与实践, 2022, 45(10): 132-138.
|
| [23] |
阎伟华, 蔡传兵, 周迪帆. 基于第二代高温超导带材的磁体研究进展与挑战[J]. 物理, 2019, 48(11): 733-748.
|
| [24] |
李洁, 刘宜平, 史越, 等. 标准化助力第二代高温超导带材产业化[J]. 中国标准化, 2021(S1): 87-93.
|
| [25] |
唐昕宇. 核聚变技术领域1955-2024年专利数据集[DS/OL]. Science Data Bank. 知识管理论坛. [2025-12-16].
DOI:10.57760/sciencedb.j00074.00155.
|
/
| 〈 |
|
〉 |