基于语义相似度的CORE论文关联关系发现及其语义服务研究

白林林, 万妮

知识管理论坛 ›› 2021, Vol. 6 ›› Issue (5) : 271-281.

PDF(1828 KB)
PDF(1828 KB)
知识管理论坛 ›› 2021, Vol. 6 ›› Issue (5) : 271-281. DOI: 10.13266/j.issn.2095-5472.2021.026
学术探索

基于语义相似度的CORE论文关联关系发现及其语义服务研究

作者信息 +

Research on CORE Paper Association Discovery and Semantic Services Based on Semantic Similarity

Author information +
文章历史 +

摘要

[目的/意义] 通过对CORE论文关系发现过程及其服务的详细剖析,希望为我国开放获取知识库在论文内容的推荐和语义链接方面提供有力的参考和借鉴。[方法/过程] 从基于语义相似度的论文关联关系发现过程和基于论文关系的语义服务两方面进行分析。其中,基于语义相似度的论文关联关系发现过程包括元数据和全文内容收割、论文之间关系语义相似度计算两方面;基于发现的论文关联关系的语义服务包括论文推荐服务和关联开放数据服务。最后总结CORE对我国机构知识库的应用建议。[结果/结论] 研究发现,CORE系统通过现有OAI-PMH协议自动收割开放获取知识库中的元数据,并进一步提取元数据中URI字段,通过HTTP协议下载全文。基于发现的论文语义关系提供论文推荐服务和论文关联数据服务,使得第三方系统可以利用CORE数据集,这些都为我国开放获取知识库(如机构知识库、开放获取期刊)在论文关系的推荐和语义链接方面提供有力的参考。

Abstract

[Purpose/significance] This paper dissects the process and services of article association discovery in Connecting Repositories, and hopes to provide powerful reference for the recommendation and semantic linking of the content of articles in Chinese open access repositories. [Method/process] This paper analyzed the discovery process of article association based on semantic similarity and the semantic services based on article association. The discovery process of article association based on semantic similarity included metadata and full-text content harvesting, and semantic similarity calculation of article association. The semantic service based on the discovery process of article association included the CORE recommendation service and the linked open data service. And this paper summarized the application suggestions of CORE to Chinese institutional repositories. [Result/conclusion] This paper finds CORE system automatically harvests the metadata of the open access repositories through the existing OAI-PMH protocol, and further extracts the URI fields from the metadata to download the full-text through the HTTP protocol. Furtherly, providing article recommendation services and services of data linked articles based on the discovery of article semantic association enables third-party systems to utilize CORE datasets, it provides a powerful reference in recommendation and semantic linking of article association for open access repositories (such as institutional repositories and open access journals) in China.

关键词

Connecting Repositories / 语义相似度 / 论文关系 / 推荐系统 / 关联数据

Key words

Connecting Repositories / semantic similarity / article association / recommendation system / linked data

引用本文

导出引用
白林林 , 万妮. 基于语义相似度的CORE论文关联关系发现及其语义服务研究[J]. 知识管理论坛. 2021, 6(5): 271-281 https://doi.org/10.13266/j.issn.2095-5472.2021.026
Bai Linlin , Wan Ni. Research on CORE Paper Association Discovery and Semantic Services Based on Semantic Similarity[J]. Knowledge Management Forum. 2021, 6(5): 271-281 https://doi.org/10.13266/j.issn.2095-5472.2021.026
中图分类号: G254   

参考文献

[1]
Openaire-history [EB/OL]. [2021-03-01]. https://www.openaire.eu/openaire-history.
[2]
SHARE [EB/OL]. [2021-02-27]. https://share.osf.io/.
[3]
The open archive HAL [EB/OL]. [2021-03-01]. https://hal.archives-ouvertes.fr/.
[4]
中国高校机构知识库联盟 [EB/OL]. [2021-03-01]. http://chair.calis.edu.cn/.
[5]
Hong Kong Institutional Repositories (HKIR) [EB/OL]. [2021-03-01]. https://library.tu.ac.th/tu-digital-collections/hong-kong-institutional-repositories-hkir.
[6]
CORE – Aggregating the world’s open access research papers [EB/OL]. [2021-03-01]. https://core.ac.uk/.
[7]
COnnecting Repositories [EB/OL]. [2021-03-01]. https://en.wikipedia.org/wiki/COnnecting_REpositories.
[8]
Knowledge Media Institute [EB/OL]. [2021-03-01]. https://news.kmi.open.ac.uk/rostra/news.php?r=11&t=2&id=18463=KMi.
[9]
CORE | Jisc [EB/OL]. [2021-03-01]. https://www.jisc.ac.uk/core#.
[10]
Digging into Connected Repositories (DiggiCORE) [EB/OL]. [2021-03-01]. https://diggingintodata.org/awards/2011/project/digging-connected-repositories-diggicore.
[11]
Data Providers [EB/OL]. [2021-03-01]. https://core.ac.uk/dataproviders.
[12]
CORE Services [EB/OL]. [2021-03-01]. https://core.ac.uk/services.
[13]
CORE Dataset [EB/OL]. [2021-03-01]. https://core.ac.uk/documentation/dataset/.
[14]
Connecting Repositories (CORE) | Digging Into Data [EB/OL]. [2021-03-01]. https://diggingintodata.org/repositories/connecting-repositories-core.
[15]
Open Archives Initiative Protocol for Metadata Harvesting [EB/OL]. [2021-03-01]. http://www.openarchives.org/pmh/.
[16]
OAIHarvester2 [EB/OL]. [2021-03-01]. https://www.oclc.org/research/activities/oaiharvester2.html.
[17]
Technical standards [EB/OL]. [2021-03-01]. https://blog.core.ac.uk/2011/03/.
[18]
Releasing 1.8 million open access publications from publisher systems for text and data mining [EB/OL]. [2021-03-01]. https://blogs.lse.ac.uk/impactofsocialsciences/2018/03/22/releasing-1-8-million-open-access-publications-from-publisher-systems-for-text-and-data-mining/.
[19]
Java文件流 BufferedStream [EB/OL]. [2021-03-01]. https://blog.csdn.net/mariofei/article/details/51195055.
[20]
Apache Lucene[EB/OL]. [2021-03-01]. http://lucene.apache.org/.
[21]
KNOTH P, ROBOTKA V, ZDRAHAL Z. Connecting repositories in the open access domain using text mining and semantic data [C]// International conference on theory and practice of digital libraries :research and advanced technology for digital libraries. Berlin: Springer, 2011: 483-487.
[22]
Apache Tika [EB/OL]. [2021-03-01]. https://tika.apache.org/https://tika.apache.org/.
[23]
FRANCINE C, AYMAN F, THORSTEN B. Multiple similarity measures and source-pair information in story link detection[C]// Proceedings of the human language technology conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004. Boston: Association for Computational Linguistics, 2004: 313-320.
[24]
CORE - Semantic Similarity of Open Access publications [EB/OL]. [2021-03-01]. https://lod-cloud.net/dataset/core.
[25]
The EPrints Bazaar [EB/OL]. [2021-03-02]. https://bazaar.eprints.org/.
[26]
CORE Recommender [EB/OL]. [2021-03-03]. https://core.ac.uk/services#recommender.
[27]
Implementing the CORE Recommender in Strathprints: a “whitehat” improvement to promote user interaction [EB/OL]. [2021-03-03]. https://blog.core.ac.uk/2017/10/31/implementing-the-core-recommender-in-strathprints-a-whitehat-improvement-to-promote-user-interaction/.
[28]
LA Referencia integrates CORE Recommender in its services [EB/OL]. [2021-03-03]. https://blog.core.ac.uk/2019/11/20/la-referencia-integrates-core-recommender-in-its-services/.
[29]
CORE Recommender installation for DSpace [EB/OL]. [2021-03-03]. https://blog.core.ac.uk/2020/03/12/core-recommender-installation-for-dspace/.
[30]
CORE Recommender now supports article discovery on arXiv [EB/OL]. [2021-03-03]. https://blog.arxiv.org/2020/10/15/core-recommender-now-supports-article-discovery-on-arxiv/.
[31]
Sesame (framework) – Wikipedia [EB/OL]. [2021-03-06]. https://en.wikipedia.org/wiki/Sesame_(framework).
[32]
The Similarity Ontology [EB/OL]. [2021-03-04]. http://grasstunes.net/ontology/similarity/0.2/musim.html.
[33]
D'ARCUS B, GIASSON F. Bibliographic ontology specification [EB/OL]. [2021-03-05]. http://bibliontology.com/.
[34]
Eclipse RDF4J – a Java framework for RDF [EB/OL]. [2021-03-10]. http://rdf4j.org/.
[35]
Overview (OpenRDF Sesame 4.1.2 API) [EB/OL]. [2021-03-15]. http://archive.rdf4j.org/javadoc/sesame-4.1.2/.
[36]
Apache Tomcat® [EB/OL]. [2021-03-15]. http://tomcat.apache.org/.
[37]
Chapter1.Introduction: what is Sesame? [EB/OL]. [2021-03-17]. https://poc.vl-e.nl/distribution/manual/sesame-1.2.3/ch01.html.
[38]
The SAIL API [EB/OL]. [2021-03-18]. http://docs.rdf4j.org/sail/.

作者贡献说明:

白林林:负责数据获取、研究提纲确定与论文撰写;

万妮: 负责论文的修订。

基金

国家社会科学青年基金项目“基于知识图谱的领域知识结构构建方法研究”(20CTQ007)
北京信息科技大学高教研究一般项目“大数据环境下北京信息科技大学图书馆资源利用研究”(2019GJYB09)

PDF(1828 KB)

Accesses

Citation

Detail

段落导航
相关文章

/