生物技术进展 ›› 2024, Vol. 14 ›› Issue (2): 323-330.DOI: 10.19586/j.2095-2341.2023.0145
• 研究论文 • 上一篇
徐婷1(), 沈佳豪2, 赵康1, 黄鹭1, 董恩惠1, 曾可心3, 卞新为3, 季明辉1(
), 许勤1(
)
收稿日期:
2023-11-11
接受日期:
2023-12-21
出版日期:
2024-03-25
发布日期:
2024-04-17
通讯作者:
季明辉,许勤
作者简介:
徐婷E-mail: tingxu1229@stu.njmu.edu.cn
基金资助:
Ting XU1(), Jiahao SHEN2, Kang ZHAO1, Lu HUANG1, Enhui DONG1, Kexin ZENG3, Xinwei BIAN3, Minghui JI1(
), Qin XU1(
)
Received:
2023-11-11
Accepted:
2023-12-21
Online:
2024-03-25
Published:
2024-04-17
Contact:
Minghui JI,Qin XU
摘要:
为探讨肠道菌群在疾病类型预测中的价值,利用机器学习基于瘤胃球菌丰度构建了疾病的非侵入性评估模型。选取ExperimentHub R库存储库数据,下载来自不同研究的人类粪便瘤胃球菌丰度信息及实验方案、疾病状态、年龄、性别、抗生素使用情况、地区、吸烟情况等多种信息,利用随机森林、决策树、Adaboost等机器学习模型建立疾病筛查的评估模型,使用GridSearchCV(网格搜索)调整参数,并用混淆矩阵评估外部验证结果。经数据处理提取标准化命名了12种瘤胃球菌、7种疾病并将25个变量进行了哑变量变换。利用多种瘤胃球菌属微生物的丰度及性别、年龄等样本一般资料信息建立了3种评估模型。其中随机森林模型准确率最高(0.884),且当n_estimators为220时,模型得分为0.892,为最佳模型。外部验证结果也显示可见模型中分类算法预测错误的情况相对较少,模型性能良好。根据粪便样本的宏基因组学数据,基于瘤胃球菌丰度利用随机森林算法可以有效地对疾病类型进行预测。
中图分类号:
徐婷, 沈佳豪, 赵康, 黄鹭, 董恩惠, 曾可心, 卞新为, 季明辉, 许勤. 基于瘤胃球菌微生物群丰度构建疾病类型预测的肠道菌群标签[J]. 生物技术进展, 2024, 14(2): 323-330.
Ting XU, Jiahao SHEN, Kang ZHAO, Lu HUANG, Enhui DONG, Kexin ZENG, Xinwei BIAN, Minghui JI, Qin XU. Bacterial Signature for Prediction of Disease Type Based on Abundance of Ruminococcus[J]. Current Biotechnology, 2024, 14(2): 323-330.
序号 | 命名 |
---|---|
1 | Ruminococcus_gnavus |
2 | Ruminococcus_torques |
3 | Ruminococcus_albus |
4 | Ruminococcus_bromii |
5 | Ruminococcus_callidus |
6 | Ruminococcus_champanellensis |
7 | Ruminococcus_flavefaciens |
8 | Ruminococcus_lactaris |
9 | Ruminococcaceae_Faecalibacterium_prausnitzii |
10 | Ruminococcaceae_bacterium_D16 |
11 | Ruminococcaceae |
12 | Ruminococcus |
表1 瘤胃球菌微生物群标准化命名表
Table 1 Standardized nomenclature for the microbiota of Ruminococcus
序号 | 命名 |
---|---|
1 | Ruminococcus_gnavus |
2 | Ruminococcus_torques |
3 | Ruminococcus_albus |
4 | Ruminococcus_bromii |
5 | Ruminococcus_callidus |
6 | Ruminococcus_champanellensis |
7 | Ruminococcus_flavefaciens |
8 | Ruminococcus_lactaris |
9 | Ruminococcaceae_Faecalibacterium_prausnitzii |
10 | Ruminococcaceae_bacterium_D16 |
11 | Ruminococcaceae |
12 | Ruminococcus |
序号 | 指标 | 编码 |
---|---|---|
1 | 健康 | 1 |
2 | 炎症性疾病(支气管炎、膀胱炎、耳炎、肺炎) | 2 |
3 | 动脉粥样硬化 | 3 |
4 | 肿瘤 | 4 |
5 | 高血压 | 5 |
6 | 糖尿病 | 6 |
7 | 传染病 | 7 |
8 | 其他 | 0 |
表2 疾病分类表
Table 2 Disease classification table
序号 | 指标 | 编码 |
---|---|---|
1 | 健康 | 1 |
2 | 炎症性疾病(支气管炎、膀胱炎、耳炎、肺炎) | 2 |
3 | 动脉粥样硬化 | 3 |
4 | 肿瘤 | 4 |
5 | 高血压 | 5 |
6 | 糖尿病 | 6 |
7 | 传染病 | 7 |
8 | 其他 | 0 |
系列序号 | 指标 | 指标代码 | 变量数量 |
---|---|---|---|
1 | 性别 | NA:0;男:1;女:2 | 3 |
2 | 年龄组别 | 新生儿:1;儿童:2;成年人:3;老年人:4 | 4 |
3 | 是否用抗生素 | NA:0;使用:1;未使用:2 | 3 |
4 | 是否吸烟 | NA:0;吸烟:1;不吸烟:2 | 3 |
5 | 是否曾经吸烟 | NA:0;曾经吸烟:1;不吸烟:2 | 3 |
6 | 国别 | NA:0;加拿大:1;中国:2;芬兰:3;以色列:4;荷兰:5;俄罗斯:6;瑞典:7;美国:8 | 9 |
表3 哑变量处理表
Table 3 Dummy variable treatment table
系列序号 | 指标 | 指标代码 | 变量数量 |
---|---|---|---|
1 | 性别 | NA:0;男:1;女:2 | 3 |
2 | 年龄组别 | 新生儿:1;儿童:2;成年人:3;老年人:4 | 4 |
3 | 是否用抗生素 | NA:0;使用:1;未使用:2 | 3 |
4 | 是否吸烟 | NA:0;吸烟:1;不吸烟:2 | 3 |
5 | 是否曾经吸烟 | NA:0;曾经吸烟:1;不吸烟:2 | 3 |
6 | 国别 | NA:0;加拿大:1;中国:2;芬兰:3;以色列:4;荷兰:5;俄罗斯:6;瑞典:7;美国:8 | 9 |
模型 | 平均值 | 标准误 | 95% CI下限 | 95% CI上限 |
---|---|---|---|---|
随机森林 | 0.884 | 0.013 | 0.876 | 0.891 |
决策树 | 0.850 | 0.017 | 0.839 | 0.861 |
Adaboost | 0.719 | 0.018 | 0.707 | 0.723 |
表4 3种模型的准确率
Table 4 Accuracy of the three models
模型 | 平均值 | 标准误 | 95% CI下限 | 95% CI上限 |
---|---|---|---|---|
随机森林 | 0.884 | 0.013 | 0.876 | 0.891 |
决策树 | 0.850 | 0.017 | 0.839 | 0.861 |
Adaboost | 0.719 | 0.018 | 0.707 | 0.723 |
n_estimators | 平均得分 | n_estimators | 平均得分 |
---|---|---|---|
1 | 0.820 | 150 | 0.886 |
10 | 0.870 | 160 | 0.889 |
20 | 0.886 | 170 | 0.886 |
30 | 0.877 | 180 | 0.887 |
40 | 0.878 | 190 | 0.886 |
50 | 0.873 | 200 | 0.889 |
60 | 0.884 | 210 | 0.887 |
70 | 0.883 | 220 | 0.892 |
80 | 0.884 | 230 | 0.892 |
90 | 0.887 | 240 | 0.887 |
100 | 0.883 | 250 | 0.892 |
110 | 0.884 | 260 | 0.890 |
120 | 0.881 | 270 | 0.887 |
130 | 0.884 | 280 | 0.887 |
140 | 0.886 | 290 | 0.889 |
表5 网格搜索结果
Table 5 Grid search results
n_estimators | 平均得分 | n_estimators | 平均得分 |
---|---|---|---|
1 | 0.820 | 150 | 0.886 |
10 | 0.870 | 160 | 0.889 |
20 | 0.886 | 170 | 0.886 |
30 | 0.877 | 180 | 0.887 |
40 | 0.878 | 190 | 0.886 |
50 | 0.873 | 200 | 0.889 |
60 | 0.884 | 210 | 0.887 |
70 | 0.883 | 220 | 0.892 |
80 | 0.884 | 230 | 0.892 |
90 | 0.887 | 240 | 0.887 |
100 | 0.883 | 250 | 0.892 |
110 | 0.884 | 260 | 0.890 |
120 | 0.881 | 270 | 0.887 |
130 | 0.884 | 280 | 0.887 |
140 | 0.886 | 290 | 0.889 |
精确率 | 召回率 | 准确率 | F1 值 | |
---|---|---|---|---|
0 | 0.752 | 0.790 | 0.885 | 0.882 |
1 | 0.819 | 0.860 | ||
2 | 0.912 | 0.930 | ||
3 | 0.943 | 1.000 | ||
4 | 0.838 | 0.880 | ||
5 | 1.000 | 0.970 | ||
6 | 0.921 | 0.700 | ||
7 | 0.913 | 0.950 | ||
平均值 | 0.887 | 0.885 |
表6 外部验证精确率、召回率、准确率、F1值
Table 6 Precision, recall, accuracy and F1 score for external validation
精确率 | 召回率 | 准确率 | F1 值 | |
---|---|---|---|---|
0 | 0.752 | 0.790 | 0.885 | 0.882 |
1 | 0.819 | 0.860 | ||
2 | 0.912 | 0.930 | ||
3 | 0.943 | 1.000 | ||
4 | 0.838 | 0.880 | ||
5 | 1.000 | 0.970 | ||
6 | 0.921 | 0.700 | ||
7 | 0.913 | 0.950 | ||
平均值 | 0.887 | 0.885 |
1 | NISHINO K, NISHIDA A, INOUE R, et al.. Analysis of endoscopic brush samples identified mucosa-associated dysbiosis in inflammatory bowel disease[J]. J. Gastroenterol., 2018, 53(1): 95-106. |
2 | JIE Z, XIA H, ZHONG S L, et al.. The gut microbiome in atherosclerotic cardiovascular disease[J/OL]. Nat. Commun., 2017, 8: 845[2024-01-20]. . |
3 | DELEDDA A, ANNUNZIATA G, TENORE G C, et al.. Diet-derived antioxidants and their role in inflammation, obesity and gut microbiota modulation[J/OL]. Antioxid. Basel, 2021, 10(5): 708[2024-01-20]. . |
4 | 计梦蕾,俞海国.微生物组学在免疫性疾病中的研究进展[J].中华实用儿科临床杂志,2019,34(17):1358-1360. |
JI M L, YU H G. Study progress of microbiome in immune diseases[J]. Chin. J. Appl. Clin. Pediatr., 2019, 34(17): 1358-1360. | |
5 | 谢雅静,时晓敏,颜世敢,等.肠道菌群与精神类疾病相关性研究进展[J].中国药理学通报,2022,38(11):1617-1622. |
XIE Y J, SHI X M, YAN S G, et al.. Progress on correlation between intestinal flora and mental diseases[J]. Chin. Pharmacol. Bull., 2022, 38(11): 1617-1622. | |
6 | LOOMBA R, SEGURITAN V, LI W, et al.. Gut microbiome-based metagenomic signature for non-invasive detection of advanced fibrosis in human nonalcoholic fatty liver disease[J]. Cell Metab., 2017, 25(5): 1054-1062. |
7 | 张昕雨,张璟,朱小强,等.基于宏基因组学分析构建诊断大肠癌的肠道菌群标签[J].上海交通大学学报(医学版),2018,38(9):1019-1026. |
ZHANG X Y, ZHANG J, ZHU X Q, et al.. Bacterial signatures for diagnosis of colorectal cancer by fecal metagenomics analysis[J]. J. Shanghai Jiaotong Univ. Med. Sci., 2018, 38(9): 1019-1026. | |
8 | 全睿琳.基于全国多中心前瞻性登记注册队列的动脉性肺动脉高压临床与预后预测研究[D].北京:中国医学科学院,2022. |
9 | MADILIGAMA A, VANDERVORT Z, KHAN A. Consecutive rate model for covid infections and deaths and prediction of level-off time[J]. ACS Omega, 2022, 7(51): 48059-48066. |
10 | 吴桐,王鸿超,陆文伟,等.肥胖人群肠道菌群特征分析及机器学习模型[J].微生物学通报,2020,47(12):4328-4337. |
WU T, WANG H C, LU W W, et al.. Characteristics of gut microbiota of obese people and machine learning model[J]. Microbiol. China, 2020, 47(12): 4328-4337. | |
11 | ARUMUGAM M, RAES J, PELLETIER E, et al.. Enterotypes of the human gut microbiome[J]. Nature, 2011, 473: 174-180. |
12 | LA REAU A J, SUEN G. The Ruminococci: key symbionts of the gut ecosystem[J]. J. Microbiol., 2018, 56(3): 199-208. |
13 | XU J, LIANG R, ZHANG W, et al.. Faecalibacterium prausnitzii-derived microbial anti-inflammatory molecule regulates intestinal integrity in diabetes mellitus mice via modulating tight junction protein expression[J]. J. Diabetes, 2020, 12(3): 224-236. |
14 | YANG J, WANG P, LIU T, et al.. Involvement of mucosal flora and enterochromaffin cells of the caecum and descending colon in diarrhoea-predominant irritable bowel syndrome[J/OL]. BMC Microbiol., 2021, 21(1): 316[2024-01-20]. . |
15 | 崔小缓,蒋兴旺,张延平,等.咽喉反流性疾病患者肠道菌群变化的初步研究[J].听力学及言语疾病杂志,2021,29(3):282-288. |
CUI X H, JIANG X W, ZHANG Y P, et al.. Intestinal microbiota analysis of patient with laryngopharyngeal reflux disease[J]. J. Audiol. Speech Pathol., 2021, 29(3): 282-288. | |
16 | HSIEH C S, RENGARAJAN S, KAU A, et al.. Altered IgA response to gut bacteria is associated with childhood asthma in Peru [J]. J. Immunol., 2021, 207(2): 398-407. |
17 | AHN J R, LEE S H, KIM B, et al.. Ruminococcus gnavus ameliorates atopic dermatitis by enhancing Treg cell and metabolites in BALB/c mice[J/OL]. Pediatr. Allergy Immunol., 2022, 33(1): e13678[2024-01-20]. . |
18 | KIM J W, KWOK S K, CHOE J Y, et al.. Recent advances in our understanding of the link between the intestinal microbiota and systemic lupus erythematosus[J/OL]. Int. J. Mol. Sci., 2019, 20(19): 4871[2024-01-20]. . |
19 | CHEN M, XIE C R, SHI Y Z, et al.. Gut microbiota and major depressive disorder: a bidirectional Mendelian randomization[J]. J. Affect Disord., 2022, 316: 187-193. |
20 | LIÑARES-BLANCO J, FERNANDEZ-LOZANO C, SEOANE J A, et al.. Machine learning based microbiome signature to predict inflammatory bowel disease subtypes[J/OL]. Front. Microbiol., 2022, 13: 872671[2024-01-20]. . |
21 | BANG S, YOO D, KIM S J, et al.. Establishment and evaluation of prediction model for multiple disease classification based on gut microbial data[J/OL]. Sci. Rep., 2019, 9: 10189[2024-01-20]. . |
22 | 李杰其,胡良兵.基于机器学习的设备预测性维护方法综述[J].计算机工程与应用,2020,56(21):11-19. |
LI J Q, HU L B. Review of machine learning for predictive maintenance[J]. Comput. Eng. Appl., 2020, 56(21): 11-19. | |
23 | 李倩, 刘芸宏, 吴晓慧, 等. 基于决策树和Logistic回归预测出血性脑卒中手术后医院感染风险 [J]. 中华医院感染学杂志, 2021,31(23):3556-3561. |
LI Q, LIU Y H, WU X H, et al.. A study on nosocomial infection risk of patients undergoing hemorrhagic stroke surgery based on decision tree and Logistic regression[J]. Chin. J. Nosocomiol., 2021, 31(23): 3556-3561. | |
24 | 李承圣,包绮晗,郝晓燕,等.基于随机森林算法的胰腺癌术后预测模型构建[J].吉林大学学报(医学版),2022,48(2):426-435. |
LI C S, BAO Q H, HAO X Y, et al.. Establishment of prediction model for postoperative pancreatic cancer based on random forest algorithm[J]. J. Jilin Univ. Med. Ed., 2022, 48(2): 426-435. | |
25 | 叶琳,石胜源,罗铁清.AdaBoost算法在乳腺癌疾病预测中的研究[J].计算机时代,2021(7):61-64. |
YE L, SHI S Y, LUO T Q. Study of AdaBoost algorithm application in breast cancer disease prediction[J]. Comput. Era, 2021(7): 61-64. | |
26 | 王新,王炯杰,王雷,等.基于CHAID决策树和Logistic回归的肺癌患者术后肺部并发症预测效果的研究[J].临床肿瘤学杂志,2021,26(10):898-902. |
WANG X, WANG J J, WANG L, et al.. Prediction of postoperative pulmonary complications in patients with lung cancer based on CHAID decision tree and logistic regression[J]. Chin. Clin. Oncol., 2021, 26(10): 898-902. | |
27 | 李强,衣杨,吴忠道,等.基于机器学习的肠道菌群数据建模与分析研究综述[J].微生物学通报,2021,48(1):180-196. |
LI Q, YI Y, WU Z D, et al.. Review of gut microbiome analysis prediction models and algorithms[J]. Microbiol. China, 2021, 48(1): 180-196. | |
28 | WANG R, CAI L, ZHANG J, et al.. Prediction of acute respiratory distress syndrome in traumatic brain injury patients based on machine learning algorithms[J/OL]. Med. Kaunas, 2023, 59(1): 171[2024-01-20]. . |
29 | FRANZOSA E A, MCIVER L J, RAHNAVARD G, et al.. Species-level functional profiling of metagenomes and metatranscriptomes[J]. Nat. Meth., 2018, 15: 962-968. |
30 | BAMPTON P, DRAPER B. Effect of relaxation music on patient tolerance of gastrointestinal endoscopic procedures[J]. J. Clin. Gastroenterol., 1997, 25(1): 343-345. |
31 | LIPSCOMB C E. Medical subject headings (MeSH)[J]. Bull. Med. Libr. Assoc., 2000, 88(3): 265-266. |
32 | SUI W, WAN L H. Association between patient activation and medication adherence in patients with stroke: a cross-sectional study[J/OL]. Neurology, 2021, 12: 722711[2024-01-20]. . |
33 | SU Q, LIU Q, LAU R I, et al.. Faecal microbiome-based machine learning for multi-class disease diagnosis[J/OL]. Nat. Commun., 2022, 13: 6818[2024-01-20]. . |
34 | HENKE M T, KENNY D J, CASSILLY C D, et al.. Ruminococcus gnavus, a member of the human gut microbiome associated with Crohn's disease, produces an inflammatory polysaccharide[J]. Proc. Natl. Acad. Sci. USA, 2019, 116(26): 12672-12677. |
35 | JOOSSENS M, HUYS G, HUYS G, et al.. Dysbiosis of the faecal microbiota in patients with Crohn's disease and their unaffected relatives[J]. Gut, 2011, 60(5): 631-637. |
36 | YAN H, QIN Q, CHEN J, et al.. Gut microbiome alterations in patients with visceral obesity based on quantitative computed tomography[J/OL]. Cell Infect. Microbiol., 2021, 11: 823262[2024-01-20]. . |
37 | CERQUEIRA F M, PHOTENHAUER A L, DODEN H L, et al.. Sas20 is a highly flexible starch-binding protein in the Ruminococcus bromii cell-surface amylosome[J/OL]. J. Biol. Chem., 2022, 298(5): 101896[2024-03-19]. . |
38 | MARKOWIAK-KOPEĆ P, ŚLIŻEWSKA K. The effect of probiotics on the production of short-chain fatty acids by human intestinal microbiome[J/OL]. Nutrients, 2020, 12(4): 1107[2024-01-20]. . |
39 | 沈倩, 张军. 丁酸盐在炎症性肠病中的作用研究进展[J]. 胃肠病学和肝病学杂志, 2022(5): 031. |
SHEN Q, ZHANG J. Research progress on the role of butyrate in inflammatory bowel disease[J]. Chin. J. Gastroenterol. Hepatol., 2022 (5): 031. | |
40 | SINGH V, LEE G, SON H, et al.. Butyrate producers, "The Sentinel of Gut": their intestinal significance with and beyond butyrate, and prospective use as microbial therapeutics[J/OL]. Front. Microbiol., 2022, 13: 1103836[2024-01-20]. . |
41 | PALACIOS T, VITETTA L, COULSON S, et al.. Targeting the intestinal microbiota to prevent type 2 diabetes and enhance the effect of metformin on glycaemia: a randomised controlled pilot study[J/OL]. Nutrients, 2020, 12(7): 2041[2024-01-20]. . |
42 | AHRENS A P, CULPEPPER T, SALDIVAR B, et al.. A six-day, lifestyle-based immersion program mitigates cardiovascular risk factors and induces shifts in gut microbiota, specifically Lachnospiraceae, Ruminococcaceae, Faecalibacterium prausnitzii: a pilot study[J/OL]. Nutrients, 2021, 13(10): 3459[2024-01-20]. . |
[1] | 曹海涛, 朱静, 马云鹏, 崔兴华. 机器学习在肠道菌群宿主表型预测中的应用[J]. 生物技术进展, 2023, 13(5): 671-680. |
[2] | 曹海涛, 朱静, 曾海波, 刘彦辰. 基于加权平均的肠道菌群特征筛选和疾病预测模型研究[J]. 生物技术进展, 2023, 13(5): 798-806. |
[3] | 马云鹏, 朱静, 崔兴华. 基于机器学习的微生物溶解有机碳含量估测[J]. 生物技术进展, 2023, 13(4): 645-653. |
[4] | 白亮, 黄鹤, 王苹. 合成生物学在治疗代谢性疾病中的研究进展[J]. 生物技术进展, 2023, 13(3): 383-389. |
[5] | 苗瑞菊, 丁尊丹, 田健, 张红兵, 关菲菲. PET水解酶传统与智能分子设计研究进展[J]. 生物技术进展, 2023, 13(1): 46-54. |
[6] | 刘梓嘉, 姜雪, 仪杨, 王濛, 马晨, 宋怡菲, 谢飞. 氢气与肠道菌群的关系研究进展[J]. 生物技术进展, 2022, 12(6): 847-852. |
[7] | 王濛, 仪杨, 孙梦婷, 刘梓嘉, 姜雪, 马晨, 宋怡菲, 谢飞. 富氢水和富氢生理盐水生物医学研究进展——动物实验[J]. 生物技术进展, 2022, 12(3): 332-343. |
[8] | 丁宁, 许叶, 曾玮思, 胡彦周, 洪凌宇, 黄昆仑, 贺晓云. 重组人乳铁蛋白和重组人溶菌酶对小鼠溃疡性结肠炎的改善作用研究[J]. 生物技术进展, 2022, 12(1): 120-128. |
[9] | 辛志奇, 赵航, 汪海, 路铁刚. 基于深度学习的作物基因组学和遗传改良[J]. 生物技术进展, 2021, 11(4): 483-488. |
[10] | 谢亚东,解明旭,李解,王安然,杨培龙,冉超,周志刚. 无菌斑马鱼感染鲤春病毒血症病毒模型的建立[J]. 生物技术进展, 2019, 9(4): 369-374. |
[11] | 王华,王志红. 大鼠慢性束缚应激后肠道病理及其菌群变化[J]. 生物技术进展, 2017, 7(1): 52-57. |
[12] | 朱晓慧,张成岗,刘海峰,. 急性应激后大鼠胃肠道病理变化及其菌群ERIC-PCR图谱分析[J]. 生物技术进展, 2016, 6(3): 200-205. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
版权所有 © 2021《生物技术进展》编辑部