基于多模态知识图谱的药用植物智能问答系统构建

赵豆豆, 王宇骏, 刘蕤, 刘昶

知识管理论坛 ›› 2024, Vol. 9 ›› Issue (5) : 487-504.

PDF(11271 KB)
PDF(11271 KB)
知识管理论坛 ›› 2024, Vol. 9 ›› Issue (5) : 487-504. DOI: 10.13266/j.issn.2095-5472.2024.036  CSTR: 32306.14.CN11-6036.2024.036
研究论文

基于多模态知识图谱的药用植物智能问答系统构建

作者信息 +

Construction of Intelligent Q&A System for Medicinal Plant Based on Multimodal Knowledge Graph

Author information +
文章历史 +

摘要

[目的/意义] 药用植物是中医药学的核心资源之一,加强药用植物信息组织与电子化利用,对中医药的传承与发展具有重要意义。[方法/过程] 首先构建药用植物知识图谱的模式层,然后对比《中国药典》一部、TCMID、PPBC、CTD等多个数据库,筛选出265种药用植物,整合多源异构数据,利用Neo4j构建多模态药用植物知识图谱。在此基础上,利用AC自动机进行用户问句实体识别,利用TextCNN完成问句意图识别,实现基于文本的智能回答功能;通过对比VGG、ResNet、DenseNet、MobileNet、EfficientNet等6个图像识别模型,优先选择EfficientNet-B3模型实现基于图像智能问答功能,并引入数据增强、标签平滑方法提升图像识别效率,最终利用Python语言PyQt库实现药用植物问答系统。[结果/结论] 构建一个包括药用植物及各植物药方、药材、化合物、图像的多模态知识图谱,包含340 772个实体和2 530 067条关系,基于此构建药用植物智能问答系统,可根据用户的自然语言提问和图片提问反馈查询结果,实验结果表明系统的图像识别准确率达到83.53%。

Abstract

[Purpose/significance] Medicinal plants are one of the core resources of Chinese medicine, and strengthening the organization and electronic utilization of medicinal plant information is of great significance to the inheritance and development of Chinese medicine. [Method/process] In this paper, the pattern layer of the knowledge graph of medicinal plants was constructed, and then 265 medicinal plants were screened out by comparing multiple databases such as Chinese Pharmacopoeia, TCMID, PPBC and CTD, a nd multi-source heterogeneous data were integrated, and Neo4j was used to construct a multimodal knowledge graph of medicinal plants. On this basis, AC automaton was used to recognize the user's question entity and TextCNN was used to complete the question intent recognition, so as to realize the text-based intelligent answer function. By comparing six image recognition models such as VGG, ResNet, DenseNet, MobileNet, and EfficientNet, the EfficientNet-B3 model is preferred to realize the image-based intelligent question and answer function, and introduced data enhancement and label smoothing methods to improve the image recognition efficiency, and finally used the Python language PyQt library to realize the medicinal plant question answering system. [Result/Conclusion] A multimodal knowledge graph including medicinal plants and various botanical formulas, medicinal materials, compounds and images is constructed, including 340 772 entities and 2 530 067 relationships. Based on this, an intelligent question answering system for medicinal plants is constructed, which can feedback the query results according to the user's natural language questions and picture questions, and the experimental results show that the image recognition accuracy of the system reaches 83.53%.

关键词

多模态 / 知识图谱 / 智能问答 / 药用植物

Key words

multimodality / knowledge graph / Intelligent Q&A / medicinal plants

引用本文

导出引用
赵豆豆 , 王宇骏 , 刘蕤 , . 基于多模态知识图谱的药用植物智能问答系统构建[J]. 知识管理论坛. 2024, 9(5): 487-504 https://doi.org/10.13266/j.issn.2095-5472.2024.036
Doudou Zhao , Yujun Wang , Rui Liu , et al. Construction of Intelligent Q&A System for Medicinal Plant Based on Multimodal Knowledge Graph[J]. Knowledge Management Forum. 2024, 9(5): 487-504 https://doi.org/10.13266/j.issn.2095-5472.2024.036
中图分类号: TP391;R284.1   

参考文献

[1]
郝二伟,谢安然,韦棪婷,等. 澜湄五国传统药用植物防治虫媒传染病研究概况[J].中国中药杂志,2021,46(24):6303-6311. (HAO E W, XIE A R, WEI Y T, et al. Traditional medicinal plants for arthropod-borne diseases of five countries in Lancang-Mekong region: a review[J]. China journal of Chinese materia medica, 2021,46(24):6303-6311.)
[2]
徐增林, 盛泳潘, 贺丽荣, 等. 知识图谱技术综述[J].电子科技大学学报,2016,45(4):589-606.(XU Z L, SHENG Y P, HE L R, et al. Review on knowledge graph techniques[J]. Journal of University of Electronic Science and Technology of China, 2016,45(4):589-606.)
[3]
翟东升, 娄莹, 阚慧敏, 等. 基于多源异构数据的中医药知识图谱构建与应用研究[J].数据分析与知识发现, 2023,7(9):146-158. (ZHAI D S, LOU Y, KAN H M, et al. Constructing TCM knowledge graph with multi-source heterogeneous data[J]. Data analysis and knowledge discovery, 2023,7(9):146-158.)
[4]
陈烨, 周刚, 卢记仓. 多模态知识图谱构建与应用研究综述[J].计算机应用研究,2021,38(12):3535-3543.(CHEN Y, ZHOU G, LU J C. Survey on construction and application research for multi-modal knowledge graphs[J]. Application research of computers, 2021,38(12):3535-3543.)
[5]
王松,李正钧,杨涛,等.中医药知识图谱研究现状及发展趋势[J].南京中医药大学学报,2022,38(3):272-278.(WANG S, LI Z J, YANG T, et al. Current status and development trend of knowledge graph research in traditional Chinese medicine[J]. Journal of Nanjing University of Traditional Chinese Medicine, 2022,38(3):272-278.)
[6]
王运乾. 植物知识图谱PlantKG的构建研究及应用[D].贵阳:贵州大学,2021.(WANG Y Q. Construction research and application of plant knowledge graph PlantKG[D]. Guiyang: Guizhou University,2021.)
[7]
ZHU X, GU Y, XIAO Z. HerbKG: constructing a herbal-molecular medicine knowledge graph using a two-stage framework based on deep transfer learning[J]. Frontiers in genetics,2022,13:799349.
[8]
WU Y, ZHANG F, YANG K, et al. SymMap: an integrative database of traditional Chinese medicine enhanced by symptom mapping[J]. Nucleic Acids Research,2019,47:1110-1117.
[9]
MENG F, TANG Q, CHU T, et al. TCMPG: an integrative database for traditional Chinese medicine plant genomes[J].Horticulture research,2022,9:uhac060.
[10]
李贺, 刘嘉宇, 李世钰, 等.基于疾病知识图谱的自动问答系统优化研究[J].数据分析与知识发现,2021,5(5):115-126.(LI H, LIU J Y, LI S Y, et al. Optimizing automatic question answering system based on disease knowledge graph[J]. Data analysis and knowledge discovery, 2021,5(5):115-126.
[11]
FADER A , ZETTLEMOYER L , ETZIONI O .Open question answering over curated and extracted knowledge bases[C]// Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. New York: Association for Computing Machinery, 2014:1156-1165.
[12]
WU W Q, ZHU Z F, LU Q, et al. Introducing external knowledge to answer questions with implicit temporal constraints over knowledge base[J]. Future internet, 2020,12(3):45.
[13]
GREEN B F, WOLF A K, CHOMSKY C L, et al. Baseball: an automatic question-answerer[C]//Proceedings of the IRE-AIEE-ACM '61 (Western). New York: Association for Computing Machinery, 1986:545-549.
[14]
WOODS W A. Lunar rocks in natural English: explorations in natural language question answering[M]. Amsterdam: Linguistic Structures Processing,1977.
[15]
FERRUCCI D A, BROWN E W, CHU-CARROLL J, et al. Building watson: an overview of the DeepQA project [J].Computer Science ,2010,31(3):59-79.
[16]
姚元杰, 龚毅光, 刘佳, 等.基于深度学习的智能问答系统综述[J].计算机系统应用,2023,32(4):1-15. (YAO Y J, GONG Y G, LIU J, et al. Survey on intelligent question answering system based on deep learning[J]. Computer systems & applications, 2023,32(4):1-15.)
[17]
吴浩锋. 基于知识图谱的食疗健康问答机器人的研究与实现[D].上海:华东师范大学,2021.(WU H F. Research and implementation of food therapy health Q&A robot base on knowledge graph[D]. Shanghai: East China Normal University, 2021.)
[18]
刘璐. 基于知识图谱的政府采购智能问答系统研究与实现[D].重庆:重庆理工大学,2022.(LIU L. Research and implementation of government procurement question answering system based on knowledge graph[D]. Chongqing: Chongqing University of Technology, 2022.)
[19]
李彦昉. 基于知识图谱的糖尿病问答系统的研究与应用[D].太原:中北大学,2022.(LI Y F. Based on knowledge graph research and application of diabetes question-answering system[D].Taiyuan: North University of China, 2022.)
[20]
张淼. 基于中文知识图谱的智能问答系统设计与实现[D].武汉:华中师范大学,2018.(ZHANG M. Design and implementation of intelligent Q&A system based on Chinese knowledge graph[D]. Wuhan: Central China Normal University, 2018.)
[21]
RAFFEL C, SHAZEER N, ROBERTS A, et al. Exploring the limits of transfer learning with a unified text-to-text transformer[J]. The journal of machine learning research,2020,21(1): 5485-5551.
[22]
MANDAR J, DANQI C, YINHAN L, et al. SpanBERT: improving pre-training by representing and predicting spans[J]. Transactions of the association for computational linguistics, 2020,8: 64-77.
[23]
NOY N F , MCGUINESS D L. Ontology development 101: a guide to creating your first ontology[EB/OL].[2024-03-12]. https://protege.stanford.edu/publications/ontology_development/ontology101.pdf.
[24]
于彤, 崔蒙, 李海燕, 等.中医药学语言系统的语义网络框架:一个面向中医药领域的规范化顶层本体[J].中国数字医学,2014,9(1):44-47.(YU T, CUI M, LI H Y, et al. Semantic network framework of traditional Chinese medicine language system: an upper-level ontology for traditional Chinese medicine[J]. China digital medicine, 2014,9(1):44-47.)
[25]
刘丽红, 贾李蓉, 朱彦, 等.中药子领域核心概念本体模型构建研究[J].中国中医药信息杂志,2018,25(11):95-98.(LIU L H, JIA L R, ZHU Y, et al. Construction od ontological modeling for core concepts of TCM subdomain[J]. Chinese journal of information on traditional Chinese medicine, 2018,25(11):95-98.)
[26]
国家药典委员会.中华人民共和国药典:一部[M].北京:中国医药科技出版社,2015.(Chinese Pharmacopoeia Commission. Pharmacopoeia of the People's Republic of China: 1[M]. Beijing: China Medical Science Press, 2015.)
[27]
XUE R, FANG Z, ZHANG M, et al. TCMID: traditional Chinese medicine integrative database for herb molecular mechanism analysis[J]. Nucleic acids research, 2012, 41(D1): 1089-1095.
[28]
ZENG X, ZHANG P, HE W, et al. NPASS: natural product activity and species source database for natural product research, discovery and tool development[J]. 2018, 46(D1): 1217-1222.
[29]
DAVIS A P, GRONDIN C J, JOHNSON R J, et al. Comparative toxicogenomics database (CTD): update 2021 [J]. Nucleic acids research, 2020, 49(D1): 1138-1143.
[30]
张维冲, 王芳, 黄毅. 基于图数据库的贵州省大数据政策知识建模研究[J].数字图书馆论坛,2020(4):30-38.(ZHANG W C, WANG F, HUANG Y. Knowledge modeling of big data policy in Guizhou province based on graph database[J]. Digital library forum, 2020(4):30-38.)
[31]
王世奇, 刘智锋, 王继民. 学者画像研究综述[J].图书情报工作,2022,66(20):73-81. (WANG S Q, LIU Z F, WANG J M. A review of scholar profiling research[J]. Library and information service, 2022,66(20):73-81.)
[32]
MIKHEEV A, GROVER C, MOENS M. Description of the LTG system used for MUC-7[C]//Proceedings of 7th message understanding conference. Fairfax: ALC, 1998.
[33]
YU S, BAI S, WU P. Description of the Kent Ridge Digital Labs system used for MUC-7[C]//Proceedings of 7th message understanding conference, Fairfax: ALC, 1998.
[34]
PENG N, DREDZE M. Improving named entity recognition for Chinese social media with word segmentation representation learning[C]//Proceedings of the 54th annual meeting of the association for computational linguistics . Berlin: Association for Computational Linguistics,2016:149-155.
[35]
王若佳, 赵常煜, 王继民. 中文电子病历的分词及实体识别研究[J].图书情报工作,2019,63(2):34-42.(WANG R J, ZHAO C Y, WANG J M. Healthcare data mining: word segmentation and named entity recognition in Chinese electronic medical record[J]. Library and information service, 2019,63(2):34-42.)
[36]
KIM Y. Convolutional neural networks for sentence classification[EB/OL]. Eprint Arxiv, 2014[2024-04-09]. https://doi.org/10.48550/arXiv.1408.5882.
[37]
SIMONYAN K, ZISSERMAN A J C. Very deep convolutional networks for large-scale image recognition[EB/OL]. Eprint Arxiv, 2014[2024-04-09]. https://doi.org/10.48550/arXiv.1409.1556.
[38]
HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016.
[39]
HUANG G, LIU Z, LAURENS V D M, et al. Densely connected convolutional networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017.
[40]
HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. Eprint Arxiv, 2017[2024-04-09]. https://doi.org/10.48550/arXiv.1704.04861.
[41]
TAN M, LE Q V. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL]. Eprint Arxiv, 2019[2024-04-09]. https://doi.org/10.48550/arXiv.1905.11946.
[42]
黄兆培, 张峰源, 赵金明, 等.情感识别中的迁移学习问题综述[J].信号处理,2023,39(4):588-615.(HUANG Y P, ZHANG F Y, ZHAO J M, et al. A survey of transfer learning problems in emotion recognition[ J]. Journal of signal processing, 2023,39(4):588-615.)
[43]
EKIN D C, BARRET Z, DANDELION M, et al. AutoAugment: learning augmentation strategies from data[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. Piscataway: IEEE, 2019:113-123.
[44]
MULLER R, KORNBLITH S, HINTON G. When does label smoothing help? [C]//Canada: 33rd Conference on neural information processing systems. Red Hook: Curran Associates Inc., 2019.

作者贡献说明/Author contributions:

赵豆豆:数据收集与对齐,知识图谱构建,论文撰写与修改;

王宇骏:问答系统构建及实验,论文撰写;

刘 蕤:研究设计,研究内容与结构修改;

刘 昶:研究数据补正,提出研究思路与框架。


PDF(11271 KB)

Accesses

Citation

Detail

段落导航
相关文章

/