Research on Automatic Classification Model of Massive Academic Resources in Library

yangya, yiyuanhong

Knowledge Management Forum ›› 2018, Vol. 3 ›› Issue (3) : 172-180.

PDF(2335 KB)
PDF(2335 KB)
Knowledge Management Forum ›› 2018, Vol. 3 ›› Issue (3) : 172-180. DOI: 10.13266/j.issn.2095-5472.2018.017

Research on Automatic Classification Model of Massive Academic Resources in Library

Author information +
History +

Abstract

[Purpose/significance] In order to solve the problem that users often have difficulty in obtaining information in massive digital resources of library, this paper construct a personalized knowledge service system, which is the inevitable choice of library to help users to get rid of the information overload predicament and improve the quality of knowledge service. [Method/process] Firstly, this paper built a mapping model of Chinese Library Classification(CLC) and subject classification. Then, based on Hadoop distributed processing platform, it proposed to build automatic classification model of massive academic resources in libraries by improving TF-IDF+ Bayesian algorithm, the model can help to construct the personalized knowledge service systems in library. [Result/conclusion] In the experimental part,we collected more than 6 million documents from CNKI as the original training corpus (corpus covers 75 disciplines) to test the effectiveness of the classification model, the experimental result shows that the classification efficiency and effectiveness of the model are achieved.

Key words

automatic classification / Hadoop / TF-IDF / Bayes

Cite this article

Download Citations
yangya , yiyuanhong. Research on Automatic Classification Model of Massive Academic Resources in Library[J]. Knowledge Management Forum. 2018, 3(3): 172-180 https://doi.org/10.13266/j.issn.2095-5472.2018.017

References

[1]
VIKAS K, VIJAYAN K, LATHA P. A comprehensive study of text classification algorithms[C]// Proceedings of 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).Udupi:IEEE press,2017:1109-1113.
[2]
高元. 面向个性化推荐的海量学术资源分类研究[D].宁波:宁波大学,2017.
[3]
贺鸣,孙建军,成颖.基于朴素贝叶斯的文本分类研究综述[J].情报科学,2016,34(7):147-154.
[4]
KUPERVASSER O. The mysterious optimality of naive bayes: estimation of the probability in the system of “classifiers”[J].Pattern recognition and image analysis,2014,24(1):1-10.
[5]
LEWIS D. Naive (Bayes) at forty: The independence assumption in information retrieval[C]//Proceedings of 10th European Conference on Machine Learning Chemnitz. Berlin: Springer,1998:4-15
[6]
LI Y J, LUO C N, CHUNG S M. Weighted naive bayes for text classification using positive term-class dependency[J].International journal on artificial intelligence tools, 2012,21(1):1250008-1250015.
[7]
邸鹏,段利国.一种新型朴素贝叶斯文本分类算法[J].数据采集与处理,2014,29(1):71-75.
[8]
杜选.基于加权补集的朴素贝叶斯文本分类算法研究[J].计算机应用与软件,2014,31(9):253-255.
[9]
张杰,陈怀新.基于归一化词频贝叶斯模型的文本分类方法[J].计算机工程与设计,2016,37(3):799-802
[10]
艾雰.2010—2016年《中国图书馆分类法》(第5版)研究现状分析[J].图书馆建设,2017(5):39-44,72.
[11]
LI Q, CHEN L. Study on multi-class text classification based on improved SVM[C] //Proceedings of the Eighth International Conference on Intelligent Systems and Knowledge Engineering, Shenzhen:Springer Berlin Heidelberg,2014:519-526.
[12]
ZHANG Y T, WANG GL. An improved TF-IDF approach for text classification[J].Journal of zhejiang university-science a,2005,6(1):49-55.
[13]
KIM S B, RIM H C. Effective Methods for improving naive bayes text classifiers[C] //Proceedings of 7th Pacific Rim international conference on artificial intelligence. Berlin:Springer, 2002: 414-423.
[14]
张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006(19):76-78.
[15]
苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006(9):1848-1859.

杨亚: 数据处理,撰写并修正论文;

易远弘: 学术资源整理,提出论文的修改意见。

PDF(2335 KB)

Accesses

Citation

Detail

Sections
Recommended

/