Research on hotline text data crime clue screening method based on keyword mining

Zhen Muhua; Chen Peng; Wang Kun; Fan Ziyang; Wang Zhe

doi:10.13266/j.issn.2095-5472.2022.044

PDF(2075 KB)

Knowledge Management Forum ›› 2022, Vol. 7 ›› Issue (5) : 539-548. DOI: 10.13266/j.issn.2095-5472.2022.044

Research on hotline text data crime clue screening method based on keyword mining

Zhen Muhua ¹ ,
Chen Peng ¹ ,
Wang Kun ² ,
Fan Ziyang ¹ ,
Wang Zhe ¹

Author information +

History +

Abstract

[Purpose/Significance] Aiming at the problem of insufficient information analysis ability in the current public security business about identification and screening of crime clues in hotline texts, a method of hotline text data crime clue screening based on keyword mining is proposed to help business departments improve relevant intelligence and judgment [Method/Process] Considering that algorithms such as automatic text classification are subject to the problem of sample size, this paper firstly identified the key information of the known attribute data and established a seed lexicon, and then used Word2Vec to expand the seed vocabulary from the perspectives of similar words and synonym words to form a professional thesaurus, and finally used a semantics-based integral screening model to screen criminal clues in the hotline text data. [Result/Conclusion] This paper conducted a crime clue screening experiment on 1 050 priori hotline text data in Jinan City. After actual comparison and index analysis, the recall rate reached 86%. The specific identification of crime information in the text data of the city hotline achieved the expected effect and realized the effective screening of crime clues.

Key words

hotline text / professional thesaurus / text similarity / crime clue screening

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

Zhen Muhua , Chen Peng , Wang Kun , et al . Research on hotline text data crime clue screening method based on keyword mining[J]. Knowledge Management Forum. 2022, 7(5): 539-548 https://doi.org/10.13266/j.issn.2095-5472.2022.044

References

List( Publishing order | Descend order by publishing year | Descend order by cited within ) Chart analysis

[1]	王勇.大数据在我国食药智慧监管中的应用[J].中国食品药品监管,2018(5):44-47. Cited in this article [1]

[2]	袁猛,刘文杰,胡建华,等.“昆仑2020”:全方位构筑食药环安全防线[J].人民公安,2020(16):30-33. Cited in this article [1]

[3]	徐建民,王金花,马伟瑜.利用本体关联度改进的TF-IDF特征词提取方法[J].情报科学,2011,29(2):279-283. Cited in this article [1]

[4]	彭云,万常选,江腾蛟,等.基于语义约束LDA的商品特征和情感词提取[J].软件学报,2017,28(3):676-693. Cited in this article [1]

[5]	刘耕,方勇,刘嘉勇.基于关联词和扩展规则的敏感词库设计[J].四川大学学报(自然科学版),2009,46(3):667-671. Cited in this article [1]

[6]	刘亚桥,陆向艳,邓凯凯,等.摄影领域评论情感词典构建方法[J].计算机工程与设计,2019,40(10):3037-3042. Cited in this article [1]

[7]	谭敏博.基于知识图谱的谷类作物病害识别及个性化推送研究[D].长沙:湖南农业大学,2018. Cited in this article [1]

[8]	夏松,林荣蓉,刘勘.网络谣言敏感词库的构建研究——以新浪微博谣言为例[J].知识管理论坛,2019,4(5):267-275. Cited in this article [1]

[9]	唐晓波,高和璇.基于关键词词向量特征扩展的健康问句分类研究[J].数据分析与知识发现,2020,4(7):66-75. Cited in this article [1]

[10]	姜天宇,王苏,徐伟.基于朴素贝叶斯的中文文本分类[J].电脑知识与技术,2019,15(23):253-254,263. Cited in this article [1]

[11]	吴绍忠.重点人员积分预警模型建设基础问题研究[J].中国人民公安大学学报(自然科学版),2012,18(2):76-79. Cited in this article [2]

[12]	涂铭,刘祥,刘树春.Python自然语言处理实战核心技术与算法[M].北京:机械工业出版社,2021:120,129. Cited in this article [1]

[13]	严红.词向量发展综述[J].现代计算机(专业版),2019(8):50-52.

[14]	CHEN K J, MA W Y. Unknown word extraction for Chinese documents[C]// Proceedings of international conference on DBLP. Taipei: Morgan Kaufmann Publishers, 2002:169-175.

[15]	PEDERSEN T, KULKARNI A. Identifying similar words and contexts in natural language with sense clusters[C]//Proceedings of the 20th national conference on artificial intelligence. Pittsburgh: AAAI Press, 2010:1694-1695.

[16]	NEVIAROUSKAYA A, PRENDINGER H, ISHIZUKAM. SentiFul: a lexicon for sentiment analysis[J].IEEE transactions on affective computing,2011,2(1):22-36. Cited in this article [1]