Research on hotline text data crime clue screening method based on keyword mining

Zhen Muhua, Chen Peng, Wang Kun, Fan Ziyang, Wang Zhe

Knowledge Management Forum ›› 2022, Vol. 7 ›› Issue (5) : 539-548.

PDF(2075 KB)
PDF(2075 KB)
Knowledge Management Forum ›› 2022, Vol. 7 ›› Issue (5) : 539-548. DOI: 10.13266/j.issn.2095-5472.2022.044

Research on hotline text data crime clue screening method based on keyword mining

Author information +
History +

Abstract

[Purpose/Significance] Aiming at the problem of insufficient information analysis ability in the current public security business about identification and screening of crime clues in hotline texts, a method of hotline text data crime clue screening based on keyword mining is proposed to help business departments improve relevant intelligence and judgment [Method/Process] Considering that algorithms such as automatic text classification are subject to the problem of sample size, this paper firstly identified the key information of the known attribute data and established a seed lexicon, and then used Word2Vec to expand the seed vocabulary from the perspectives of similar words and synonym words to form a professional thesaurus, and finally used a semantics-based integral screening model to screen criminal clues in the hotline text data. [Result/Conclusion] This paper conducted a crime clue screening experiment on 1 050 priori hotline text data in Jinan City. After actual comparison and index analysis, the recall rate reached 86%. The specific identification of crime information in the text data of the city hotline achieved the expected effect and realized the effective screening of crime clues.

Key words

hotline text / professional thesaurus / text similarity / crime clue screening

Cite this article

Download Citations
Zhen Muhua , Chen Peng , Wang Kun , et al . Research on hotline text data crime clue screening method based on keyword mining[J]. Knowledge Management Forum. 2022, 7(5): 539-548 https://doi.org/10.13266/j.issn.2095-5472.2022.044

References

[1]
王勇.大数据在我国食药智慧监管中的应用[J].中国食品药品监管,2018(5):44-47.
[2]
袁猛,刘文杰,胡建华,等.“昆仑2020”:全方位构筑食药环安全防线[J].人民公安,2020(16):30-33.
[3]
徐建民,王金花,马伟瑜.利用本体关联度改进的TF-IDF特征词提取方法[J].情报科学,2011,29(2):279-283.
[4]
彭云,万常选,江腾蛟,等.基于语义约束LDA的商品特征和情感词提取[J].软件学报,2017,28(3):676-693.
[5]
刘耕,方勇,刘嘉勇.基于关联词和扩展规则的敏感词库设计[J].四川大学学报(自然科学版),2009,46(3):667-671.
[6]
刘亚桥,陆向艳,邓凯凯,等.摄影领域评论情感词典构建方法[J].计算机工程与设计,2019,40(10):3037-3042.
[7]
谭敏博.基于知识图谱的谷类作物病害识别及个性化推送研究[D].长沙:湖南农业大学,2018.
[8]
夏松,林荣蓉,刘勘.网络谣言敏感词库的构建研究——以新浪微博谣言为例[J].知识管理论坛,2019,4(5):267-275.
[9]
唐晓波,高和璇.基于关键词词向量特征扩展的健康问句分类研究[J].数据分析与知识发现,2020,4(7):66-75.
[10]
姜天宇,王苏,徐伟.基于朴素贝叶斯的中文文本分类[J].电脑知识与技术,2019,15(23):253-254,263.
[11]
吴绍忠.重点人员积分预警模型建设基础问题研究[J].中国人民公安大学学报(自然科学版),2012,18(2):76-79.
[12]
涂铭,刘祥,刘树春.Python自然语言处理实战核心技术与算法[M].北京:机械工业出版社,2021:120,129.
[13]
严红.词向量发展综述[J].现代计算机(专业版),2019(8):50-52.
[14]
CHEN K J, MA W Y. Unknown word extraction for Chinese documents[C]// Proceedings of international conference on DBLP. Taipei: Morgan Kaufmann Publishers, 2002:169-175.
[15]
PEDERSEN T, KULKARNI A. Identifying similar words and contexts in natural language with sense clusters[C]//Proceedings of the 20th national conference on artificial intelligence. Pittsburgh: AAAI Press, 2010:1694-1695.
[16]
NEVIAROUSKAYA A, PRENDINGER H, ISHIZUKAM. SentiFul: a lexicon for sentiment analysis[J].IEEE transactions on affective computing,2011,2(1):22-36.

甄沐华:设计研究方法,完成实验,起草论文,修改论文与定稿

陈鹏:提出研究思路,修改论文

王坤:提供数据,提出研究问题

范子杨:采集数据,进行实验

王者:集数据,进行实验

PDF(2075 KB)

Accesses

Citation

Detail

Sections
Recommended

/