
数字人文视角下多日记人物关系联合挖掘及可视化研究——以西南联大相关日记为例
Joint Mining and Visualization of Character Relationships in Multiple Diaries from the Perspective of Digital Humanities——A Case Study of Diaries Related to Southwest Associated University
[目的/意义]联合挖掘与西南联大有关的多部名人日记,构建融合多部文献信息的西南联大社会网络图谱,以期通过多日记联合挖掘,发现更多的潜在社会关系,突破单日记社会网络挖掘的局限性。[方法/过程]以1938—1941年间与西南联大相关的多部日记为语料,利用Python程序统计人物共现关系,使用Gephi构建多日记社交网络图谱。通过社会网络分析方法,对网络拓扑特征、人物中心度特征以及基于模块化和K-core的人物群体特征等进行分析和探讨。[结果/结论]相较于独立日记挖掘,多日记社会网络联合挖掘显示出更明显的网络结构特征,更加去中心化,社会关系信息也更为丰富,可揭示出较为隐蔽的社交关系,在数字人文领域具有良好的应用价值。
[Purpose/Significance] By jointly mining multiple diaries related to National South-west Associated University (NSAU), a social network graph of NSAU that integrates information from multiple sources is constructed. The aim is to discover more potential social relationships through joint mining of multiple diaries, and break through the limitations of single diary social network mining. [Method/Process] Using multiple diaries related to NSAU from 1938 to 1941 as corpus, Python program is used to count co-occurrence relationships of characters, and Gephi is used to construct multi-diary social network graph. Through social network analysis methods, the network topology features, character centrality features and character group features based on modularity and K-core are analyzed and discussed. [Result/Conclusion] Compared with independent diary mining, multi-diary social network joint mining showed more obvious network structure features, more decentralized and rich social relationship information, which can reveal more hidden social relationships, and has good application value in the field of digital humanities.
digital humanities / social network / text mining / National South-west Associated University
[1] |
武晓春, 黄萱菁, 吴立德. 基于语义分析的作者身份识别方法研究[J]. 中文信息学报, 2006(6): 61–68.
|
[2] |
年洪东, 陈小荷, 王东波. 现当代文学作品的作者身份识别研究[J]. 计算机工程与应用, 2010, 46(4): 226–229.
|
[3] |
LORD G, SMITH M N, KIRSCHENBUAM M G, et al. Exploring erotics in Emily Dickinson's correspondence with text mining and visual interfaces[C]// Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries. New York: ACM, 2006:141-150.
|
[4] |
邰沁清, 夏恩赏, 饶高琦, 等. 数字人文视角下的金庸文本挖掘研究[J]. 数字人文, 2020(4): 115–136.
|
[5] |
Yumpu.com. Seeking the sentimental in nineteenth century American fiction[EB/OL]. [2023–03–12]. https://www.yumpu.com/en/document/view/33692161/seeking-the-sentimental-in-nineteenth-century-american-fiction.
|
[6] |
MORETTI F. Network theory, plot analysis[J]. New left review, 2011(68): 80–102.
|
[7] |
范文洁, 李忠凯, 黄水清. 基于社会网络分析的《左传》战争计量及可视化研究[J]. 图书情报工作, 2020, 64(6): 90–99.
|
[8] |
宋雪雁, 霍晓楠, 刘寅鹏, 等. 数字人文视角下《全唐诗》贬谪诗人社会关系研究[J]. 现代情报, 2022, 42(2): 14–21.
|
[9] |
REYNALDO. Analyzing social networks of XML plays: exploring Shakespeare's genres - DH2018[EB/OL]. [2023–03–12]. https://dh2018.adho.org/en/analyzing-social-networks-of-xml-plays-exploring-shakespeares-genres/.
|
[10] |
程宁, 李斌, 葛四嘉, 等. 基于BiLSTM-CRF的古汉语自动断句与词法分析一体化研究[J]. 中文信息学报, 2020, 34(4): 1–9.
|
[11] |
程宁. 基于深度学习的古籍文本断句与词法分析一体化处理技术研究[D].南京:南京师范大学,2020.
|
[12] |
李斌, 袁义国, 芦靖雅, 等. 第一届古代汉语分词和词性标注国际评测[J]. 中文信息学报, 2023, 37(3): 46-53.
|
[13] |
于舒娟, 毛新涛, 张昀, 等. 基于词典和字形特征的中文命名实体识别[J]. 中文信息学报, 2023, 37(3): 112–122.
|
[14] |
刘浏. 古汉语典籍中的实体知识挖掘研究[D].南京:南京大学,2018.
|
[15] |
汤亚芬.先秦古汉语典籍中的人名自动识别研究[J].现代图书情报技术,2013(S1):63-68.
|
[16] |
齐世荣. 谈日记的史料价值[J]. 首都师范大学学报(社会科学版), 2011(6): 1–15.
|
[17] |
GRATTAN R F. A study in comparative strategy using the Alanbrooke diaries[J]. Management decision, 2004, 42(8): 1024–1036.
|
[18] |
张诗洋. 新发现张彭春日记的文献价值考述[J]. 文献, 2021(5): 73–88.
|
[19] |
吴景平. 蒋介石与抗战初期国民党的对日和战态度——以名人日记为中心的比较研究[J]. 抗日战争研究, 2010(2): 131–144.
|
[20] |
CSERPES T. Measuring identity change: analysing fragments from the diary of Sándor Károlyi with social-network analysis[J]. European review of history: revue européenne d’histoire, 2012, 19(5): 729-748.
|
[21] |
ZHOU J, ZHU T. Research on the psychology of historical figures based on big data analysis and data mining: taking Zeng Guofan’s diary as an example[C]// Proceedings of 3rd international academic exchange conference on science and technology innovation. Guangzhou: IAECST, 2021: 704-708.
|
[22] |
宋雪雁, 崔浩男, 梁颖, 等. 数字人文视角下名人日记资源知识发现研究——以王世杰日记为例[J]. 情报理论与实践, 2021, 44(6): 105–111.
|
[23] |
宋雪雁, 钟文敏. 数字人文视角下《谭延闿日记》人物关系挖掘及可视化研究[J]. 情报科学, 2022, 40(6): 25–35.
|
[24] |
宋雪雁, 钟文敏. 数字人文视域下《谭延闿日记》的地理位置挖掘与可视化研究[J]. 兰台世界, 2021(10): 33-38.
|
[25] |
黄紫荆,邱玉倩,沈彤,等.数字人文视角下的《拉贝日记》情感识别与分析[J].图书馆论坛,2023,43(3):54-63.
|
[26] |
PaddleNLP Contributors. PaddleNLP: an easy-to-use and high performance NLP library[EB/OL]. [2023-03-01]. https://github.com/PaddlePaddle/PaddleNLP.
|
[27] |
Gephi. CSV Format[EB/OL]. [2023-03-02]. https://gephi.org/users/supported-graph-formats/csv-format.
|
[28] |
JACOMY M, VENTURINI T, HEYMANN S, et al. ForceAtlas2, a continuous graph layout algorithm for handy network visualization designed for the Gephi software[J]. Plos one, 2014, 9(6): e98679.
|
[29] |
WATTS D J, STROGATZ S H. Collective Dynamics of 'small-world' networks[J]. Nature, 1998, 393(6684): 440-442.
|
[30] |
NEWMAN M E J, GIRVAN M. Finding and evaluating community structure in networks[J]. Physical review E, 2004, 69(2): 026113.
|
[31] |
GREEN D G, LIU J, ABBASS H A. Dual-phase evolution[M]. New York: Springer, 2014: 3-40.
|
[32] |
于正阳. 西南联大梅贻琦治校理念及实践研究:一个关系协调的视角[J]. 扬州大学学报(高教研究版), 2021, 25(3): 52–59.
|
[33] |
许渊冲. 西南联大求学日记[M]. 北京:中译出版社, 2021.
|
[34] |
北京大学. 国立西南联合大学史料:教职员卷[M]. 昆明:云南教育出版社, 1998.
|
[35] |
杨绍军. 魏建功先生在西南联大[J]. 学术探索, 2011(1): 2,145.
|
[36] |
闻黎明. 关于西南联合大学战时从军运动的考察[J]. 抗日战争研究, 2010(3): 5–18.
|
[37] |
张友仁. 赵迺抟教授的生平和学术(下)[J]. 西安财经学院学报, 2015, 28(2): 121–128.
|
[38] |
刘火雄. 兴观群怨 诗史互证——郑天挺西南联大时期的诗词交游及其学术活动考察[J]. 文艺评论, 2022(5): 17–25.
|
[39] |
郑天挺. 郑天挺西南联大日记[M]. 北京:中华书局, 2018.
|
[40] |
吴卫萍. 朱自清、叶圣陶的成都友谊[J]. 青年文学家, 2010(1): 24.
|
[41] |
朱自清. 朱自清日记·上(1937-1941)[M]. 北京:石油工业出版社, 2018.
|
张锦胜:确定选题,提出研究思路,分析和处理数据,撰写论文,修改论文;
林泽斐:修改论文并定稿。
/
〈 |
|
〉 |