Research on Automatic Classification Model of Massive Academic Resources in Library

yangya; yiyuanhong

doi:10.13266/j.issn.2095-5472.2018.017

PDF(2335 KB)

Knowledge Management Forum ›› 2018, Vol. 3 ›› Issue (3) : 172-180. DOI: 10.13266/j.issn.2095-5472.2018.017

Research on Automatic Classification Model of Massive Academic Resources in Library

yangya ,
yiyuanhong

Author information +

History +

Abstract

[Purpose/significance] In order to solve the problem that users often have difficulty in obtaining information in massive digital resources of library, this paper construct a personalized knowledge service system, which is the inevitable choice of library to help users to get rid of the information overload predicament and improve the quality of knowledge service. [Method/process] Firstly, this paper built a mapping model of Chinese Library Classification(CLC) and subject classification. Then, based on Hadoop distributed processing platform, it proposed to build automatic classification model of massive academic resources in libraries by improving TF-IDF+ Bayesian algorithm, the model can help to construct the personalized knowledge service systems in library. [Result/conclusion] In the experimental part，we collected more than 6 million documents from CNKI as the original training corpus (corpus covers 75 disciplines) to test the effectiveness of the classification model, the experimental result shows that the classification efficiency and effectiveness of the model are achieved.

Key words

automatic classification / Hadoop / TF-IDF / Bayes

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

yangya , yiyuanhong. Research on Automatic Classification Model of Massive Academic Resources in Library[J]. Knowledge Management Forum. 2018, 3(3): 172-180 https://doi.org/10.13266/j.issn.2095-5472.2018.017

References

List( Publishing order | Descend order by publishing year | Descend order by cited within ) Chart analysis

[1]	VIKAS K, VIJAYAN K, LATHA P. A comprehensive study of text classification algorithms[C]// Proceedings of 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).Udupi:IEEE press,2017:1109-1113. Cited in this article [1]

[2]	高元. 面向个性化推荐的海量学术资源分类研究[D].宁波：宁波大学,2017. Cited in this article [1]

[3]	贺鸣,孙建军,成颖.基于朴素贝叶斯的文本分类研究综述[J].情报科学,2016,34(7):147-154. Cited in this article [1]

[4]	KUPERVASSER O. The mysterious optimality of naive bayes: estimation of the probability in the system of “classifiers”[J].Pattern recognition and image analysis,2014,24(1):1-10. Cited in this article [1]

[5]	LEWIS D. Naive (Bayes) at forty: The independence assumption in information retrieval[C]//Proceedings of 10th European Conference on Machine Learning Chemnitz. Berlin: Springer,1998:4-15 Cited in this article [1]

[6]	LI Y J, LUO C N, CHUNG S M. Weighted naive bayes for text classification using positive term-class dependency[J].International journal on artificial intelligence tools, 2012,21(1)：1250008-1250015. Cited in this article [1]

[7]	邸鹏,段利国.一种新型朴素贝叶斯文本分类算法[J].数据采集与处理,2014,29(1):71-75. Cited in this article [1]

[8]	杜选.基于加权补集的朴素贝叶斯文本分类算法研究[J].计算机应用与软件,2014,31(9):253-255. Cited in this article [1]

[9]	张杰,陈怀新.基于归一化词频贝叶斯模型的文本分类方法[J].计算机工程与设计,2016,37(3):799-802 Cited in this article [1]

[10]	艾雰.2010—2016年《中国图书馆分类法》(第5版)研究现状分析[J].图书馆建设,2017(5):39-44，72. Cited in this article [1]

[11]	LI Q, CHEN L. Study on multi-class text classification based on improved SVM[C] //Proceedings of the Eighth International Conference on Intelligent Systems and Knowledge Engineering, Shenzhen:Springer Berlin Heidelberg,2014:519-526. Cited in this article [1]

[12]	ZHANG Y T, WANG GL. An improved TF-IDF approach for text classification[J].Journal of zhejiang university-science a,2005,6(1):49-55. Cited in this article [1]

[13]	KIM S B, RIM H C. Effective Methods for improving naive bayes text classifiers[C] //Proceedings of 7th Pacific Rim international conference on artificial intelligence. Berlin:Springer, 2002: 414-423. Cited in this article [1]