国内外大语言模型生成中文论文摘要对比研究——以图书情报领域为例

邢淼; 田丽

doi:10.13266/j.issn.2095-5472.2024.032

PDF(1374 KB)

知识管理论坛 ›› 2024, Vol. 9 ›› Issue (5) : 437-447. DOI: 10.13266/j.issn.2095-5472.2024.032 CSTR: 32306.14.CN11-6036.2024.032

研究论文

国内外大语言模型生成中文论文摘要对比研究——以图书情报领域为例

邢淼 ¹ ,
田丽 ¹^,²

作者信息 +

Comparative Research on the Abstracts of Chinese Papers Generating Large Language Models at Home and Abroad： Taking the Field of Library and Information as an Example

Miao Xing ¹ ,
Li Tian ¹^,²

Author information +

文章历史 +

摘要

[目的/意义] 通过对国内外典型的大语言模型所生成的中文论文摘要进行对比分析，总结归纳两者间的异同点，为大语言模型后续的深度开发和发展研究提供参考。[方法/过程] 选取2023年国家社会科学基金年度项目中“图书馆、情报与文献学”学科的121个课题作为题目，通过ChatGPT4.0与文心大模型4.0分别生成中文摘要，经过数据预处理及文本分析，从高频词特征、词性分布、句子数量以及摘要内容长度等角度探讨国内外大语言模型生成内容的异同。然后，与中文期刊《图书情报工作》中的摘要写作做对比，判断大语言模型生成摘要是否贴合中文论文写作规范。[结果/结论] 文心一言生成摘要篇幅较短，字数较少，更贴合中文论文写作标准，GPT生成摘要的平均字数及句子数量较多，通过对比两个典型大语言模型生成内容的差距及特点，为大语言模型的完善与进一步深度开发提供一定的参考。

Abstract

[Purpose/Significance] By comparing and analyzing the abstracts of Chinese papers generated by typical Large Language Models at home and abroad, we summarize the similarities and differences between the two, and provide references for the subsequent in-depth development of the Large Language Models and the development of research. [Method/Process] 121 topics in the discipline of "Library, Intelligence and Documentation " in the annual project of the National Social Science Foundation of China in 2023 were selected as the topics. The Chinese abstracts were generated by ChatGPT4.0 and ERNIE 4.0 respectively, and were analyzed in terms of the characteristics of high-frequency words, the distribution of words, the number of sentences, and the length of the abstract content to explore the similarities and differences of the content generated by the Large Language Models at home and abroad through the data preprocessing and the text analysis. Then, the comparison was also made with the abstracts written in the Chinese journal “Library and Intelligence Service” to determine whether the abstracts generated by the large language model are in line with the norms of Chinese thesis writing.[Result/Conclusion] The abstracts generated by ERNIE Bot are shorter, with fewer words, and more suitable for Chinese paper writing standards, while GPT generates abstracts with more words and sentences on average. By comparing the gaps and characteristics of the contents generated by the two typical Large Language Models, we can provide certain references for the improvement and further in-depth development of the Large Language Models.

导出引用

邢淼 , 田丽. 国内外大语言模型生成中文论文摘要对比研究——以图书情报领域为例[J]. 知识管理论坛. 2024, 9(5): 437-447 https://doi.org/10.13266/j.issn.2095-5472.2024.032

Miao Xing , Li Tian. Comparative Research on the Abstracts of Chinese Papers Generating Large Language Models at Home and Abroad： Taking the Field of Library and Information as an Example[J]. Knowledge Management Forum. 2024, 9(5): 437-447 https://doi.org/10.13266/j.issn.2095-5472.2024.032

中图分类号： G25

参考文献

列表( 原文顺序 | 文献年度倒序 | 文中引用次数倒序 ) 可视化分析

[1]	丽台科技.大型语言模型有哪些用途？大型语言模型如何运作呢？[EB/OL].[2024-06-15].https://www.elecfans.com/d/2024516.html.(LEADTEK. What are the uses of large-scale language models? How do large-scale language models work? [EB/OL].[2024-06-15].https://www.elecfans.com/d/2024516.html.) 本文引用 [1]

[2]	清华大学.国产对话模型ChatGLM启动内测[EB/OL].[2024-06-15].https://www.tsinghua.edu.cn/info/1182/102133.htm.(TSINGHUA UNIVERSITY. A domestic dialogue model,ChatGLM has started internal testing[EB/OL].[2024-06-15].https://www.tsinghua.edu.cn/info/1182/102133.htm.) 本文引用 [1]

[3]	百度.最新成果!中国计算机大会现场王海峰揭秘文心大模型4.0[EB/OL].[2024-06-15].https://mp.weixin.qq.com/s/K5WRrfIoDtxPkZIlgXo9xQ.(BAIDU.Latest achievement! Wang Haifeng reveals ERNIE Bot4.0 at China computer conference [EB/OL].[2024-06-15].https://mp.weixin.qq.com/s/K5WRrfIoDtxPkZIlgXo9xQ.) 本文引用 [1]

[4]	JUNGWIRTH D, HALUZA D. Artificial intelligence and the sustainable development goals: an exploratory study in the context of the society domain[J].Journal of software engineering and applications,2023,16(4):91-112. 本文引用 [1]

[5]	CHOI W, ZHANG Y, STVILIA B. Exploring applications and user experience with generative AI tools: a content analysis of reddit posts on ChatGPT[J]. Proceedings of the Association for Information Science and Technology,2023,60(1):543-546. 本文引用 [1]

[6]	HUANG X R , ESTAU D, LIU X N,et al. Evaluating the performance of ChatGPT in clinical pharmacy: a comparative study of ChatGPT and clinical pharmacists[J].British journal of clinical pharmacology,2024,90(1),232-238. 本文引用 [1]

[7]	KIM J H, KIM J, KIM S, et al. Effects of AI ChatGPT on travelers' travel decision-making[J].Tourism review,2024,79(5):1038-1057. 本文引用 [1]

[8]	翟其玲,张佳怡,刘宝瑞,等.基于LDA主题模型对AIGC的影响力分析[J].数据挖掘,2023,13(4):366-375. (ZHAI Q L, ZHANG J Y, LIU B R, et al. Influence analysis of AIGC based on LDA topic model[J].Hans journal of data mining,2023,13(4):366-375.) 本文引用 [1]

[9]	张新新,黄如花.生成式智能出版的应用场景、风险挑战与调治路径[J].图书情报知识,2023,40(5):77-86,27.(ZHANG X X, HUANG R H. Application scenarios, risk challenges and regulatory pathways of generative intelligent publishing[J].Documentation, information&knowledge,2023,40(5):77-86,27.) 本文引用 [1]

[10]

陆伟,马永强,刘家伟,等.数智赋能的科研创新——基于数智技术的创新辅助框架探析[J].情报学报,2023,42(9):1009-1017.(LU W,MA Y Q,LIU J W, et al. Data intelligence empowered innovation: an exploration of the innovation assistance framework based on data intelligence technology[J]. Journal of the China Society for Scientific and Technical Information,2023,42(9):1009-1017.)

本文引用 [1]

[11]	陆伟,刘家伟,马永强,等.ChatGPT为代表的大模型对信息资源管理的影响[J].图书情报知识,2023,40(2):6-9,70.(LU W,LIU J W,MA Y Q, et al. The influence of language models represented by ChatGPT on information resources management[J]. Documentation, information & knowledge,2023,40(2):6-9,70.) 本文引用 [1]

[12]	曹树金,曹茹烨.从ChatGPT看生成式AI对情报学研究与实践的影响[J].现代情报,2023,43(4):3-10.(CAO S J,CAO R Y. Influence of generative AI on the research and practice of information science from the perspective of ChatGPT[J]. Journal of modern information,2023,43(4):3-10.) 本文引用 [1]

[13]	赵浜,曹树金.国内外生成式AI大模型执行情报领域典型任务的测试分析[J].情报资料工作,2023,44(5):6-17.(ZHAO B,CAO S J. Test analysis of typical tasks in the information field performed by generative AI large models at home and abroad[J].Information and documentation services,2023,44(5):6-17.) 本文引用 [1]

[14]	张宏玲,沈立力,韩春磊,等.大语言模型对图书馆数字人文工作的挑战及应对思考[J].图书馆杂志,2023,42(11):31-39,61.(ZHANG H L, SHEN L L, HAN C L, et al. Challenges and reflections on the practical application of large language model in digital humanities work at libraries[J]. Library journal,2023,42(11):31-39,61.) 本文引用 [1]

[15]	张强,高颖,赵逸淳,等.ChatGPT在智慧图书馆建设中的机遇与挑战[J].图书馆理论与实践,2023(6):116-122.(ZHANG Q, GAO Y, ZHAO Y C, et al. The opportunity and challenge of ChatGPT in the construction of intelligent library[J]. Library theory and practice,2023(6):116-122.) 本文引用 [1]

[16]	ZUCKERMAN M, FLOOD R, TAN R J B, et al. ChatGPT for assessment writing[J].Medical teacher,2023,45(11):1224-1227. 本文引用 [1]

[17]	ALKAISSI H ,MCFARLANE S I. Artificial hallucinations in ChatGPT: implications in scientific writing[J].Cureus journal of medical science,2023,15(2):e35179. 本文引用 [1]

[18]	王一博,郭鑫,刘智锋,等.AI生成与学者撰写中文论文摘要的检测与差异性比较研究[J].情报杂志,2023,42(9):127-134.(WANG Y B, GUO X, LIU Z F, et al. Detection and comparative study of differences between AI-generated and scholar-written Chinese abstracts[J].Journal of intelligence,2023,42(9):127-134.) 本文引用 [1]

[19]	郭鑫,王一博,王继民.ChatGPT生成中文学术内容分析——以情报学领域为例[J].图书馆论坛,2024,44(3):134-143.(GUO X,WANG Y B,WANG J M. Feature analysis of Chinese academic content generated by ChatGPT: an example in the field of intelligence[J].Library tribune,2024,44(3):134-143.) 本文引用 [1]

[20]	王雅琪,曹树金.ChatGPT用于论文创新性评价的效果及可行性分析[J].情报资料工作,2023,44(5):28-38.(WANG Y Q,CAO S J. The effect and feasibility analysis of ChatGPT used in paper innovativeness evaluation[J].Information and documentation services,2023,44(5):28-38.) 本文引用 [1]

[21]	SALVAGNO M , TACCONE F S, GERLI A G. Can artificial intelligence help for scientific writing?[J].Critical care,2023,27(1):75-79. 本文引用 [1]

[22]	白如江,陈启明,张玉洁,等.基于ChatGPT+Prompt的专利技术功效实体自动生成研究[J]. 数据分析与知识发现,2024, 8 (4): 14-25.(BAI R J, CHEN Q M, ZHANG Y J, et al. Research on automatic entities generation of patent technology function matrix based on ChatGPT+Prompt[J].Data analysis and knowledge discovery,2024,8(4):14-25.) 本文引用 [1]

[23]	AYERS J W, POLIAK A, DREDZE M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum[J].JAMA internal medicine,2023,183(6):589-596. 本文引用 [1]