基于主题模型和时间序列分析的新兴主题识别与特征关联研究

李雅倩, 孙玉玲, 赵婉雨

知识管理论坛 ›› 2022, Vol. 7 ›› Issue (3) : 229-247.

PDF(3324 KB)
PDF(3324 KB)
知识管理论坛 ›› 2022, Vol. 7 ›› Issue (3) : 229-247. DOI: 10.13266/j.issn.2095-5472.2022.020
学术探索

基于主题模型和时间序列分析的新兴主题识别与特征关联研究

作者信息 +

Research on Emerging Topic Recognition and Feature Association Based on Topic Model and Time Series Analysis

Author information +
文章历史 +

摘要

[目的/意义]开展新兴主题识别研究,科学有效地发掘其特征关联规律,可以更好地服务于现实需求,发挥科技情报研究对学科发展的创新支撑作用。 [方法/过程]从新兴主题特征定义出发,结合新兴主题研究与科学影响评价的相关理论与实践,利用自然语言处理、全局主成分分析和时间序列分析方法建立新兴主题识别的方法框架,量化主题的一致性、新颖性、影响力和增长性等特征,结合趋势预测完成对新兴主题的提取、分析和识别。在新兴主题识别的基础上,深度挖掘目标领域新兴主题发展的规律,利用格兰杰因果检验和协整分析,对其特征关联效应进行长期均衡检验和因果关系推断,分析影响新兴主题发展的长期关联因素及其作用关系。[结果/结论]提出一套新兴主题识别及其关联特征分析的方法。为证实该方法的可行性和有效性,选取湿地领域开展实证研究,结合主题识别与特征关联效应分析,刻画该领域主题科学影响的动态发展路径,从关联特征视角出发提出新兴主题的建设思考。

Abstract

[Purpose/Significance] Carrying out research on emerging research topics(ERT) identification and scientifically and effectively discovering their characteristic correlation laws can better serve practical needs and give play to the innovative supporting role of sci-tech information research on the development of disciplines. Aiming at discovering emerging research topic(ERT) and its characteristic correlation effect scientifically and effectively, this paper carries out ERT identification and feature analysis, while realizing the innovative supporting role of sci-tech information work. [Method/Process] Starting from the definition of the features of ERT, this paper established the methodological framework of ERT identification by using natural language processing, global principal component analysis and time series analysis. Based on the relevant theories and practices of emerging topic identification and scientific impact assessment, this thesis quantified the characteristics of the topic’s consistency, novelty, influence, and growth. On the basis of emerging themes identification, the law of the development of emerging themes in the target field is deeply excavated. Granger causality test and cointegration analysis were used to explore the long term equilibrium and the correlation effects of their characteristics. [Result/Conclusion] This paper proposes a method to identify ERT and their correlation feature analysis. In order to verify the effectiveness and feasibility of this method, the field of wetland was selected to carry out empirical research. Combined with the topic identification and feature correlation effect analysis, the final result depicted the dynamic development path of subject science influence in this field, while putting forward some advices on developing emerging topics from the perspective of associated characteristics.

关键词

趋势预测 / 新兴主题识别 / 特征关联效应 / 协整分析 / 面板数据分析

Key words

trend forecasting / emerging research topic identification / characteristic correlation effect / cointegration analysis / panel data analysis

引用本文

导出引用
李雅倩 , 孙玉玲 , 赵婉雨. 基于主题模型和时间序列分析的新兴主题识别与特征关联研究[J]. 知识管理论坛. 2022, 7(3): 229-247 https://doi.org/10.13266/j.issn.2095-5472.2022.020
Li Yaqian , Sun Yuling , Zhao Wanyu. Research on Emerging Topic Recognition and Feature Association Based on Topic Model and Time Series Analysis[J]. Knowledge Management Forum. 2022, 7(3): 229-247 https://doi.org/10.13266/j.issn.2095-5472.2022.020
中图分类号: G20   

参考文献

[1]
刘自强,王效岳,白如江.多维度视角下学科主题演化可视化分析方法研究——以我国图书情报领域大数据研究为例[J].中国图书馆学报,2016,42(6):67-84.
[2]
王山.研究前沿探测方法进展[J].情报科学,2019,37(10):164-169.
[3]
XU H, WINNINK J, YUE Z, et al. Multidimensional scientometric indicators for the detection of emerging research topics[J]. Technological forecasting and social change, 2021, 163:1-25.
[4]
LU C, HOU H, DING Y, et al. Review of internatinonal studies on discovering emerging topics[J/OL]. Journal of the China Society for Scientific and Technical Information, 2019[2021-09-13]. http://en.cnki.com.cn/Article_en/CJFDTotal-QBXB201901011.htm.
[5]
LIU G Y, HU J M, WANG H L. A co-word analysis of digital library field in China[J].Scientometrics,2012,91(1):203-217.
[6]
CHI R, YOUNG J. The interdisciplinary structure of research on intercultural relations: a co-citation network analysis study[J]. Scientometrics, 2013, 96(1): 147-171.
[7]
SONG M, KIM S Y. Detecting the knowledge structure of bioinformatics by mining full-text collections[J].Scientometrics,2013,96(1):183-201.
[8]
钟辉新.新兴趋势探测研究综述[J].现代情报,2017,37(12):162-167.
[9]
XU S, HAO L, AN X, et al. Emerging research topics detection with multiple machine learning models[J].Journal of informetrics,2019, 13(4):100983.
[10]
刘小玲,谭宗颖.新兴技术主题识别方法研究进展[J].图书情报工作,2020, 64(11):145-152.
[11]
DE SOLLA PRICE D J. Networks of scientific papers[J].Science,1965, 149(3683):510-515.
[12]
OHNIWA R L, HIBINO A, TAKEYASU K. Trends in research foci in life science fields over the last 30 years monitored by emerging topics[J]. Scientometrics, 2010, 85(1): 111-127.
[13]
TU Y N, SENG J L. Indices of novelty for emerging topic detection[J]. Information processing & management,2012,48(2):303-325.
[14]
ROTOLO D, HICKS D, MARTIN B R. What is an emerging technology?[J]. Research policy,2015,44(10):1827-1843.
[15]
WANG Q. A bibliometric model for identifying emerging research topics[J]. Journal of the Association for Information Science and Technology,2018,69(2):290-304.
[16]
SMALL H. Co-citation in the scientific literature: a new measure of the relationship between two documents[J]. Journal of the American Society for Information Science,1973,24(4):265-269.
[17]
CHEN C. CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature[J].Journal of the American Society for Information Science and Technology,2006,57(3): 359-377
[18]
王燕鹏.国内基于主题模型的科技文献主题发现及演化研究进展[J].图书情报工作,2016, 60(3):130-137.
[19]
BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of machine learning research,2003,3(4/5):993-1022.
[20]
GERRISH S, BLEI D M. A language-based approach to measuring scholarly impact[C/OL].[2021-08-10]. https://openreview.net/forum?id=HJ-9EsbdWr.
[21]
XU M, LI G, WANG X.Detecting emerging topics by exploiting probability burst and association rule mining: a case study of library and information science[J]. Malaysian journal of library & information science,2020,25(1):47-66.
[22]
李静,徐路路.基于机器学习算法的研究热点趋势预测模型对比与分析——BP神经网络、支持向量机与LSTM模型[J].现代情报,2019,39(4):23-33.
[23]
白敬毅,颜端武,陈琼.基于主题模型和曲线拟合的新兴主题趋势预测研究[J].情报理论与实践,2020,43(7):130-136,193.
[24]
KONTOSTATHIS A, GALITSKY L M, POTTENGER W M, et al. A survey of emerging trend detection in textual data mining[C]//BERRY M W. Survey of text mining: clustering, classification, and retrieval. New York: Springer, 2004: 185-224.
[25]
LEE C, KWON O, KIM M, et al. Early identification of emerging technologies: a machine learning approach using multiple patent indicators[J].Technological forecasting and social change, 2018, 127: 291-303.
[26]
岳丽欣,周晓英,陈旖旎.基于ARIMA模型的信息构建研究主题趋势预测研究[J].图书情报知识,2019(5): 54-63,72.
[27]
岳丽欣,刘自强,胡正银.面向趋势预测的热点主题演化分析方法研究[J]. 数据分析与知识发现, 2020,4(6):22-34.
[28]
刘自强,许海云,岳丽欣,等.面向研究前沿预测的主题扩散演化滞后效应研究[J].情报学报,2018,37(10): 979-988.
[29]
白如江,刘博文,冷伏海.基于多维指标的未来新兴科学研究前沿识别研究[J].情报学报,2020,39(7):747-760.
[30]
王茜,谭宗颖,钱力.科学研究社会影响力评价综述[J].图书情报工作,2015, 59(14):143-148.
[31]
GONZALEZ-ALCAIDE G, GORRAIZ J, HERVAS-OLIVER J L. On the use of bibliometric indicators for the analysis of emerging topics and their evolution: spin-offs as a case study[J]. Profesional de la informacion, 2018, 27(3): 493-510.
[32]
GUO H, WEINGART S, BÖRNER K. Mixed-indicators model for identifying emerging research areas[J]. Scientometrics, 2011, 89(1): 421-435.
[33]
万伦来,干俊峰,余晓钰.基于Matlab的时序全局主成分分析方法及应用[J]. 华东经济管理,2010,24(1):150-153.
[34]
CHEN B, TSUTSUI S, DING Y, et al.Understanding the topic evolution in a scientific domain: an exploratory study for the field of information retrieval[J].Journal of informetrics,2017,11(4):1175-1189.
[35]
ENGLE R F, GRANGER C W J. Co-integration and error correction: representation, estimation, and testing[J]. Econometrica, 1987, 55(2): 251.
[36]
KAO C W, CHIANG M H. On the estimation and inference of a cointegrated regression in panel data[J]. Advances ecoometrics,2000, 15:179-222.
[37]
DUMITRESCU E I, HURLIN C. Testing for granger non-causality in heterogeneous panels[J]. Economic modelling,2012,29(4):1450-1460.
[38]
乔峰,姚俭.时序全局主成分分析在经济发展动态描绘中的应用[J].数理统计与管理,2003(2):1-5.
[39]
罗瑞,许海云,董坤.领域前沿识别方法综述[J].图书情报工作,2018,62(23):119-131.
[40]
黄晓斌,吴高.学科领域研究前沿探测方法研究述评[J].情报学报,2019, 38(8):872-880.
[41]
谷祖莎.我国贸易开放与二氧化碳排放的关系研究[J].学术论坛,2012, 35(8):109-112.
[42]
JUODIS A, KARAVIAS Y, SARAFIDIS V. A homogeneous approach to testing for Granger non-causality in heterogeneous panels[J]. Empirical Economics, 2021, 60(1). DOI:10.1007/s00181-020-01970-9.

作者贡献声明:

李雅倩:研究框架搭建,数据分析,文章撰写

孙玉玲:论文指导,成稿修改

赵婉雨:数据收集与预处理


PDF(3324 KB)

Accesses

Citation

Detail

段落导航
相关文章

/