网络存档数据质量保证策略理论框架研究

王文玲, 曲云鹏

知识管理论坛 ›› 2018, Vol. 3 ›› Issue (2) : 106-115.

PDF(644 KB)
PDF(644 KB)
知识管理论坛 ›› 2018, Vol. 3 ›› Issue (2) : 106-115. DOI: 10.13266/j.issn.2095-5472.2018.011

网络存档数据质量保证策略理论框架研究

作者信息 +

Research on the Theoretical Framework of Web Archiving Data Quality Assurance Strategies

Author information +
文章历史 +

摘要

[目的/意义] 数据质量保证工作是网络存档工作中的一项重要工作,其贯穿整个网络存档工作的始终,决定网络资源存档工作的成败。[方法/过程] 通过对国内外各保存机构的质量保证策略及方法进行分析、研究和对比,提出数据质量保证的策略理论框架。[结果/结论] 该框架以数据为中心,制定一系列的业务标准及工作规范,利用现有软件工具开展全流程的数据质量检查工作,同时以团队建设、运行环境维护及授权获取网站备份作为补充手段,确保获取高质量的存档数据。

Abstract

[Purpose/significance] Quality assurance is one of the most important procedures in web archiving, it runs throughout the whole web archiving work and affects the success odds of web archiving work. [Method/process] In this article, we made an analysis and comparative study for the quality assurance strategies of domestic and foreign web archiving organizations, and proposed a strategic theoretical framework for data quality assurance. [Result/conclusion] The framework in this article is a data-centered design, it includes a series of criteria and operating specifications, carries out data quality inspection throughout the collecting procedure by using semi-automatic auxiliary tools. Meanwhile, to ensure access to high quality archive data, the framework also takes team building, running environment maintenance and authorized backup to the websites as supplementary means.

关键词

网络资源存档 / 质量保证 / 质量检查

Key words

web archiving / quality assurance / quality inspection

引用本文

导出引用
王文玲 , 曲云鹏. 网络存档数据质量保证策略理论框架研究[J]. 知识管理论坛. 2018, 3(2): 106-115 https://doi.org/10.13266/j.issn.2095-5472.2018.011
Wang Wenling , Qu Yunpeng. Research on the Theoretical Framework of Web Archiving Data Quality Assurance Strategies[J]. Knowledge Management Forum. 2018, 3(2): 106-115 https://doi.org/10.13266/j.issn.2095-5472.2018.011
中图分类号: G251   

参考文献

[1]
BRAGG M, HANNA K. The Web Archiving Life Cycle Model[EB/OL].[2018-03-12]. https://archive-it.org/static/files/archiveit_life_cycle_model.pdf.
[2]
王文玲,曲云鹏.网络资源存档数据质量问题初探[J].数字图书馆论坛, 2018(4):8-13.
[3]
AYALA B R, PHILLIPS M, KO L.Current quality assurance practices in Web archiving [EB/OL]. [2018-02-05].https://digital.library.unt.edu/ark:/67531/metadc333026/m2/1/high_res_d/QA_in_WebArchiving.pdf.
[4]
ANTRACOLI A, DUCKWORTH S, SILVA J. Capture all the URLs: first steps in Web archiving [EB/OL].[2018-03-01]. http://palrap.pitt.edu/ojs/index.php/palrap/article/view/67/370.
[5]
ILLIEN G. Sketching and checking quality for Web archives: a first stage report from BnF[EB/OL]. [2016-05-05]. http://bibnum.bnf.fr/conservation/bnf-qualityforwebarchives-feb06.pdf.
[6]
SHALLCROSS M. Quality assurance for the Bentley Historical Library Web archives: guidelines and procedures[EB/OL]. [2018-03-01]. https://deepblue.lib.umich.edu/bitstream/handle/2027.42/94162/BHL_WebArchivesQA-v3-20130909.pdf.
[7]
闫宏飞,黄连恩,谢正茂,等.Web Infomall:一个大规模的Web存档系统[C]//.网络资源采集与数字资源长期保存学术研讨会论文集.北京:国家图书馆出版社,2013.
[8]
国家图书馆.国家图书馆2017年年鉴[EB/OL].[2018-03-12].http://www.nlc.cn/dsb_footer/gygt/ndbg/nj2017/201712/P020171220578252136424.pdf.
[9]
Heritrix[EB/OL]. [2018-03-12]. https://webarchive.jira.com/wiki/spaces/Heritrix/overview.
[10]
NetArchiveSuite[EB/OL].[2018-03-12]. https://sbforge.org/display/NAS/NetarchiveSuite.
[11]
JHOVE2[EB/OL]. [2018-03-12]. https://bitbucket.org/jhove2/main/wiki/Home.
[12]
CLARKE N.Java Web archive toolkit[EB/OL]. [2018-03-18]. https://sbforge.org/display/JWAT/Overview.
[13]
Hanzo.WARC Tools project[EB/OL]. [2018-03-18]. http://netpreserve.org/projects/warc-tools-project/.
[14]
Wayback machine[EB/OL]. [2018-03-18]. http://wayback.archive-it.org/.
[15]
OpenWayback [EB/OL]. [2018-03-18]. https://github.com/iipc/openwayback/wiki.
[16]
DENEV D, MAZEIKA A, SPANIOL M. The SHARC Framework for data quality in Web archiving[EB/OL].[2018-03-12]. https://domino.mpi-inf.mpg.de/intranet/ag5/ag5publ.nsf/AuthorEditorIndividualView/0de8d19ced5a8ae7c1257849005270a3/$FILE/sharc-vldbj.pdf.

作者贡献说明

王文玲: 负责资料收集、分析和论文撰写;

曲云鹏: 提出论文写作思路,修订完善论文。


编辑: 刘远颖
PDF(644 KB)

Accesses

Citation

Detail

段落导航
相关文章

/