
Research on hotline text data crime clue screening method based on keyword mining
Zhen Muhua, Chen Peng, Wang Kun, Fan Ziyang, Wang Zhe
Knowledge Management Forum ›› 2022, Vol. 7 ›› Issue (5) : 539-548.
Research on hotline text data crime clue screening method based on keyword mining
[Purpose/Significance] Aiming at the problem of insufficient information analysis ability in the current public security business about identification and screening of crime clues in hotline texts, a method of hotline text data crime clue screening based on keyword mining is proposed to help business departments improve relevant intelligence and judgment [Method/Process] Considering that algorithms such as automatic text classification are subject to the problem of sample size, this paper firstly identified the key information of the known attribute data and established a seed lexicon, and then used Word2Vec to expand the seed vocabulary from the perspectives of similar words and synonym words to form a professional thesaurus, and finally used a semantics-based integral screening model to screen criminal clues in the hotline text data. [Result/Conclusion] This paper conducted a crime clue screening experiment on 1 050 priori hotline text data in Jinan City. After actual comparison and index analysis, the recall rate reached 86%. The specific identification of crime information in the text data of the city hotline achieved the expected effect and realized the effective screening of crime clues.
hotline text / professional thesaurus / text similarity / crime clue screening
[1] |
王勇.大数据在我国食药智慧监管中的应用[J].中国食品药品监管,2018(5):44-47.
|
[2] |
袁猛,刘文杰,胡建华,等.“昆仑2020”:全方位构筑食药环安全防线[J].人民公安,2020(16):30-33.
|
[3] |
徐建民,王金花,马伟瑜.利用本体关联度改进的TF-IDF特征词提取方法[J].情报科学,2011,29(2):279-283.
|
[4] |
彭云,万常选,江腾蛟,等.基于语义约束LDA的商品特征和情感词提取[J].软件学报,2017,28(3):676-693.
|
[5] |
刘耕,方勇,刘嘉勇.基于关联词和扩展规则的敏感词库设计[J].四川大学学报(自然科学版),2009,46(3):667-671.
|
[6] |
刘亚桥,陆向艳,邓凯凯,等.摄影领域评论情感词典构建方法[J].计算机工程与设计,2019,40(10):3037-3042.
|
[7] |
谭敏博.基于知识图谱的谷类作物病害识别及个性化推送研究[D].长沙:湖南农业大学,2018.
|
[8] |
夏松,林荣蓉,刘勘.网络谣言敏感词库的构建研究——以新浪微博谣言为例[J].知识管理论坛,2019,4(5):267-275.
|
[9] |
唐晓波,高和璇.基于关键词词向量特征扩展的健康问句分类研究[J].数据分析与知识发现,2020,4(7):66-75.
|
[10] |
姜天宇,王苏,徐伟.基于朴素贝叶斯的中文文本分类[J].电脑知识与技术,2019,15(23):253-254,263.
|
[11] |
吴绍忠.重点人员积分预警模型建设基础问题研究[J].中国人民公安大学学报(自然科学版),2012,18(2):76-79.
|
[12] |
涂铭,刘祥,刘树春.Python自然语言处理实战核心技术与算法[M].北京:机械工业出版社,2021:120,129.
|
[13] |
严红.词向量发展综述[J].现代计算机(专业版),2019(8):50-52.
|
[14] |
CHEN K J, MA W Y. Unknown word extraction for Chinese documents[C]// Proceedings of international conference on DBLP. Taipei: Morgan Kaufmann Publishers, 2002:169-175.
|
[15] |
PEDERSEN T, KULKARNI A. Identifying similar words and contexts in natural language with sense clusters[C]//Proceedings of the 20th national conference on artificial intelligence. Pittsburgh: AAAI Press, 2010:1694-1695.
|
[16] |
NEVIAROUSKAYA A, PRENDINGER H, ISHIZUKAM. SentiFul: a lexicon for sentiment analysis[J].IEEE transactions on affective computing,2011,2(1):22-36.
|
甄沐华:设计研究方法,完成实验,起草论文,修改论文与定稿
陈鹏:提出研究思路,修改论文
王坤:提供数据,提出研究问题
范子杨:采集数据,进行实验
王者:集数据,进行实验
/
〈 |
|
〉 |