Research on the Theoretical Framework of Web Archiving Data Quality Assurance Strategies

Wang Wenling, Qu Yunpeng

Knowledge Management Forum ›› 2018, Vol. 3 ›› Issue (2) : 106-115.

PDF(644 KB)
PDF(644 KB)
Knowledge Management Forum ›› 2018, Vol. 3 ›› Issue (2) : 106-115. DOI: 10.13266/j.issn.2095-5472.2018.011

Research on the Theoretical Framework of Web Archiving Data Quality Assurance Strategies

Author information +
History +

Abstract

[Purpose/significance] Quality assurance is one of the most important procedures in web archiving, it runs throughout the whole web archiving work and affects the success odds of web archiving work. [Method/process] In this article, we made an analysis and comparative study for the quality assurance strategies of domestic and foreign web archiving organizations, and proposed a strategic theoretical framework for data quality assurance. [Result/conclusion] The framework in this article is a data-centered design, it includes a series of criteria and operating specifications, carries out data quality inspection throughout the collecting procedure by using semi-automatic auxiliary tools. Meanwhile, to ensure access to high quality archive data, the framework also takes team building, running environment maintenance and authorized backup to the websites as supplementary means.

Key words

web archiving / quality assurance / quality inspection

Cite this article

Download Citations
Wang Wenling , Qu Yunpeng. Research on the Theoretical Framework of Web Archiving Data Quality Assurance Strategies[J]. Knowledge Management Forum. 2018, 3(2): 106-115 https://doi.org/10.13266/j.issn.2095-5472.2018.011

References

[1]
BRAGG M, HANNA K. The Web Archiving Life Cycle Model[EB/OL].[2018-03-12]. https://archive-it.org/static/files/archiveit_life_cycle_model.pdf.
[2]
王文玲,曲云鹏.网络资源存档数据质量问题初探[J].数字图书馆论坛, 2018(4):8-13.
[3]
AYALA B R, PHILLIPS M, KO L.Current quality assurance practices in Web archiving [EB/OL]. [2018-02-05].https://digital.library.unt.edu/ark:/67531/metadc333026/m2/1/high_res_d/QA_in_WebArchiving.pdf.
[4]
ANTRACOLI A, DUCKWORTH S, SILVA J. Capture all the URLs: first steps in Web archiving [EB/OL].[2018-03-01]. http://palrap.pitt.edu/ojs/index.php/palrap/article/view/67/370.
[5]
ILLIEN G. Sketching and checking quality for Web archives: a first stage report from BnF[EB/OL]. [2016-05-05]. http://bibnum.bnf.fr/conservation/bnf-qualityforwebarchives-feb06.pdf.
[6]
SHALLCROSS M. Quality assurance for the Bentley Historical Library Web archives: guidelines and procedures[EB/OL]. [2018-03-01]. https://deepblue.lib.umich.edu/bitstream/handle/2027.42/94162/BHL_WebArchivesQA-v3-20130909.pdf.
[7]
闫宏飞,黄连恩,谢正茂,等.Web Infomall:一个大规模的Web存档系统[C]//.网络资源采集与数字资源长期保存学术研讨会论文集.北京:国家图书馆出版社,2013.
[8]
国家图书馆.国家图书馆2017年年鉴[EB/OL].[2018-03-12].http://www.nlc.cn/dsb_footer/gygt/ndbg/nj2017/201712/P020171220578252136424.pdf.
[9]
Heritrix[EB/OL]. [2018-03-12]. https://webarchive.jira.com/wiki/spaces/Heritrix/overview.
[10]
NetArchiveSuite[EB/OL].[2018-03-12]. https://sbforge.org/display/NAS/NetarchiveSuite.
[11]
JHOVE2[EB/OL]. [2018-03-12]. https://bitbucket.org/jhove2/main/wiki/Home.
[12]
CLARKE N.Java Web archive toolkit[EB/OL]. [2018-03-18]. https://sbforge.org/display/JWAT/Overview.
[13]
Hanzo.WARC Tools project[EB/OL]. [2018-03-18]. http://netpreserve.org/projects/warc-tools-project/.
[14]
Wayback machine[EB/OL]. [2018-03-18]. http://wayback.archive-it.org/.
[15]
OpenWayback [EB/OL]. [2018-03-18]. https://github.com/iipc/openwayback/wiki.
[16]
DENEV D, MAZEIKA A, SPANIOL M. The SHARC Framework for data quality in Web archiving[EB/OL].[2018-03-12]. https://domino.mpi-inf.mpg.de/intranet/ag5/ag5publ.nsf/AuthorEditorIndividualView/0de8d19ced5a8ae7c1257849005270a3/$FILE/sharc-vldbj.pdf.

王文玲: 负责资料收集、分析和论文撰写;

曲云鹏: 提出论文写作思路,修订完善论文。

PDF(644 KB)

Accesses

Citation

Detail

Sections
Recommended

/