图书馆海量学术资源自动分类模型研究

杨亚; 易远弘

doi:10.13266/j.issn.2095-5472.2018.017

PDF(2335 KB)

知识管理论坛 ›› 2018, Vol. 3 ›› Issue (3) : 172-180. DOI: 10.13266/j.issn.2095-5472.2018.017

图书馆海量学术资源自动分类模型研究

杨亚 ,
易远弘

作者信息 +

Research on Automatic Classification Model of Massive Academic Resources in Library

yangya ,
yiyuanhong

Author information +

文章历史 +

摘要

[目的/意义] 针对用户在图书馆海量数字资源中常常面临获取信息困难的问题，构建一套个性化知识服务系统，认为该系统是图书馆帮助用户摆脱信息超载困境和提升知识服务质量的必然选择。[方法/过程] 通过建立中图法和学科分类法两大知识组织体系的映射模型，基于Hadoop分布式处理平台，提出一种改进TF-IDF+贝叶斯算法构建图书馆海量学术资源自动分类模型，辅助完善图书馆个性化知识服务系统的构建。［结果/结论］ 以自中国知网抓取的600万余篇文献作为原始训练语料（语料涵盖75个学科）测试该分类模型的有效性，实验结果证明该模型的分类效率和效果都达到了预期。

Abstract

[Purpose/significance] In order to solve the problem that users often have difficulty in obtaining information in massive digital resources of library, this paper construct a personalized knowledge service system, which is the inevitable choice of library to help users to get rid of the information overload predicament and improve the quality of knowledge service. [Method/process] Firstly, this paper built a mapping model of Chinese Library Classification(CLC) and subject classification. Then, based on Hadoop distributed processing platform, it proposed to build automatic classification model of massive academic resources in libraries by improving TF-IDF+ Bayesian algorithm, the model can help to construct the personalized knowledge service systems in library. [Result/conclusion] In the experimental part，we collected more than 6 million documents from CNKI as the original training corpus (corpus covers 75 disciplines) to test the effectiveness of the classification model, the experimental result shows that the classification efficiency and effectiveness of the model are achieved.

导出引用

杨亚 , 易远弘. 图书馆海量学术资源自动分类模型研究[J]. 知识管理论坛. 2018, 3(3): 172-180 https://doi.org/10.13266/j.issn.2095-5472.2018.017

yangya , yiyuanhong. Research on Automatic Classification Model of Massive Academic Resources in Library[J]. Knowledge Management Forum. 2018, 3(3): 172-180 https://doi.org/10.13266/j.issn.2095-5472.2018.017

中图分类号： G250

参考文献

列表( 原文顺序 | 文献年度倒序 | 文中引用次数倒序 ) 可视化分析

[1]	VIKAS K, VIJAYAN K, LATHA P. A comprehensive study of text classification algorithms[C]// Proceedings of 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).Udupi:IEEE press,2017:1109-1113. 本文引用 [1]

[2]	高元. 面向个性化推荐的海量学术资源分类研究[D].宁波：宁波大学,2017. 本文引用 [1]

[3]	贺鸣,孙建军,成颖.基于朴素贝叶斯的文本分类研究综述[J].情报科学,2016,34(7):147-154. 本文引用 [1]

[4]	KUPERVASSER O. The mysterious optimality of naive bayes: estimation of the probability in the system of “classifiers”[J].Pattern recognition and image analysis,2014,24(1):1-10. 本文引用 [1]

[5]	LEWIS D. Naive (Bayes) at forty: The independence assumption in information retrieval[C]//Proceedings of 10th European Conference on Machine Learning Chemnitz. Berlin: Springer,1998:4-15 本文引用 [1]

[6]	LI Y J, LUO C N, CHUNG S M. Weighted naive bayes for text classification using positive term-class dependency[J].International journal on artificial intelligence tools, 2012,21(1)：1250008-1250015. 本文引用 [1]

[7]	邸鹏,段利国.一种新型朴素贝叶斯文本分类算法[J].数据采集与处理,2014,29(1):71-75. 本文引用 [1]

[8]	杜选.基于加权补集的朴素贝叶斯文本分类算法研究[J].计算机应用与软件,2014,31(9):253-255. 本文引用 [1]

[9]	张杰,陈怀新.基于归一化词频贝叶斯模型的文本分类方法[J].计算机工程与设计,2016,37(3):799-802 本文引用 [1]

[10]	艾雰.2010—2016年《中国图书馆分类法》(第5版)研究现状分析[J].图书馆建设,2017(5):39-44，72. 本文引用 [1]

[11]	LI Q, CHEN L. Study on multi-class text classification based on improved SVM[C] //Proceedings of the Eighth International Conference on Intelligent Systems and Knowledge Engineering, Shenzhen:Springer Berlin Heidelberg,2014:519-526. 本文引用 [1]

[12]	ZHANG Y T, WANG GL. An improved TF-IDF approach for text classification[J].Journal of zhejiang university-science a,2005,6(1):49-55. 本文引用 [1]

[13]	KIM S B, RIM H C. Effective Methods for improving naive bayes text classifiers[C] //Proceedings of 7th Pacific Rim international conference on artificial intelligence. Berlin:Springer, 2002: 414-423. 本文引用 [1]