
网络存档数据质量保证策略理论框架研究
Research on the Theoretical Framework of Web Archiving Data Quality Assurance Strategies
[目的/意义] 数据质量保证工作是网络存档工作中的一项重要工作,其贯穿整个网络存档工作的始终,决定网络资源存档工作的成败。[方法/过程] 通过对国内外各保存机构的质量保证策略及方法进行分析、研究和对比,提出数据质量保证的策略理论框架。[结果/结论] 该框架以数据为中心,制定一系列的业务标准及工作规范,利用现有软件工具开展全流程的数据质量检查工作,同时以团队建设、运行环境维护及授权获取网站备份作为补充手段,确保获取高质量的存档数据。
[Purpose/significance] Quality assurance is one of the most important procedures in web archiving, it runs throughout the whole web archiving work and affects the success odds of web archiving work. [Method/process] In this article, we made an analysis and comparative study for the quality assurance strategies of domestic and foreign web archiving organizations, and proposed a strategic theoretical framework for data quality assurance. [Result/conclusion] The framework in this article is a data-centered design, it includes a series of criteria and operating specifications, carries out data quality inspection throughout the collecting procedure by using semi-automatic auxiliary tools. Meanwhile, to ensure access to high quality archive data, the framework also takes team building, running environment maintenance and authorized backup to the websites as supplementary means.
[1] |
BRAGG M, HANNA K. The Web Archiving Life Cycle Model[EB/OL].[2018-03-12]. https://archive-it.org/static/files/archiveit_life_cycle_model.pdf.
|
[2] |
王文玲,曲云鹏.网络资源存档数据质量问题初探[J].数字图书馆论坛, 2018(4):8-13.
|
[3] |
AYALA B R, PHILLIPS M, KO L.Current quality assurance practices in Web archiving [EB/OL]. [2018-02-05].https://digital.library.unt.edu/ark:/67531/metadc333026/m2/1/high_res_d/QA_in_WebArchiving.pdf.
|
[4] |
ANTRACOLI A, DUCKWORTH S, SILVA J. Capture all the URLs: first steps in Web archiving [EB/OL].[2018-03-01]. http://palrap.pitt.edu/ojs/index.php/palrap/article/view/67/370.
|
[5] |
ILLIEN G. Sketching and checking quality for Web archives: a first stage report from BnF[EB/OL]. [2016-05-05]. http://bibnum.bnf.fr/conservation/bnf-qualityforwebarchives-feb06.pdf.
|
[6] |
SHALLCROSS M. Quality assurance for the Bentley Historical Library Web archives: guidelines and procedures[EB/OL]. [2018-03-01]. https://deepblue.lib.umich.edu/bitstream/handle/2027.42/94162/BHL_WebArchivesQA-v3-20130909.pdf.
|
[7] |
闫宏飞,黄连恩,谢正茂,等.Web Infomall:一个大规模的Web存档系统[C]//.网络资源采集与数字资源长期保存学术研讨会论文集.北京:国家图书馆出版社,2013.
|
[8] |
国家图书馆.国家图书馆2017年年鉴[EB/OL].[2018-03-12].http://www.nlc.cn/dsb_footer/gygt/ndbg/nj2017/201712/P020171220578252136424.pdf.
|
[9] |
Heritrix[EB/OL]. [2018-03-12]. https://webarchive.jira.com/wiki/spaces/Heritrix/overview.
|
[10] |
NetArchiveSuite[EB/OL].[2018-03-12]. https://sbforge.org/display/NAS/NetarchiveSuite.
|
[11] |
JHOVE2[EB/OL]. [2018-03-12]. https://bitbucket.org/jhove2/main/wiki/Home.
|
[12] |
CLARKE N.Java Web archive toolkit[EB/OL]. [2018-03-18]. https://sbforge.org/display/JWAT/Overview.
|
[13] |
Hanzo.WARC Tools project[EB/OL]. [2018-03-18]. http://netpreserve.org/projects/warc-tools-project/.
|
[14] |
Wayback machine[EB/OL]. [2018-03-18]. http://wayback.archive-it.org/.
|
[15] |
OpenWayback [EB/OL]. [2018-03-18]. https://github.com/iipc/openwayback/wiki.
|
[16] |
DENEV D, MAZEIKA A, SPANIOL M. The SHARC Framework for data quality in Web archiving[EB/OL].[2018-03-12]. https://domino.mpi-inf.mpg.de/intranet/ag5/ag5publ.nsf/AuthorEditorIndividualView/0de8d19ced5a8ae7c1257849005270a3/$FILE/sharc-vldbj.pdf.
|
王文玲: 负责资料收集、分析和论文撰写;
曲云鹏: 提出论文写作思路,修订完善论文。
/
〈 |
|
〉 |