生物技术进展 ›› 2023, Vol. 13 ›› Issue (5): 671-680.DOI: 10.19586/j.2095-2341.2021.0201
收稿日期:
2021-12-29
接受日期:
2022-05-16
出版日期:
2023-09-25
发布日期:
2023-10-10
通讯作者:
朱静
作者简介:
曹海涛 E-mail: 2232060551@qq.com;
基金资助:
Haitao CAO(), Jing ZHU(), Yunpeng MA, Xinghua CUI
Received:
2021-12-29
Accepted:
2022-05-16
Online:
2023-09-25
Published:
2023-10-10
Contact:
Jing ZHU
摘要:
随着第二代DNA测序技术的发展,研究人员积累了大量的肠道菌群数据,研究表明肠道菌群与宿主健康状况存在密切联系,因此如何对复杂、高维的肠道菌群数据进行建模分析,是当前生物信息学研究中的重要挑战。人工智能的兴起为处理肠道菌群数据,揭示肠道菌群与宿主表型之间的复杂关系提供了可能。综述了现阶段肠道菌群与宿主表型之间的相关研究,重点介绍了常用的5种机器学习算法(线性回归、支持向量机、K-近邻、随机森林、人工神经网络)的理论原理及在相关研究中的应用,对预测宿主表型的机器学习算法选择提出了建议,并对该领域的未来发展进行了展望,以期为利用机器学习对肠道菌群宿主表型预测提供参考依据。
中图分类号:
曹海涛, 朱静, 马云鹏, 崔兴华. 机器学习在肠道菌群宿主表型预测中的应用[J]. 生物技术进展, 2023, 13(5): 671-680.
Haitao CAO, Jing ZHU, Yunpeng MA, Xinghua CUI. Application of Machine Learning in Phenotypic Prediction of Gut Microbiota[J]. Current Biotechnology, 2023, 13(5): 671-680.
疾病类型 | 样本数 | 负样本数 | 正样本数 | 算法类型 | 评价标准 | 预测精度 |
---|---|---|---|---|---|---|
2型糖尿病 | 344 | 170 | 174 | 随机森林 | AUC | 0.74 |
支持向量机 | AUC | 0.66 | ||||
弹性网 | AUC | 0.70 | ||||
套索 | AUC | 0.71 | ||||
806 | 423 | 383 | 逻辑回归 | F1分数 | 0.91 | |
支持向量机 | F1分数 | 0.91 | ||||
自适应提升 | F1分数 | 0.90 | ||||
梯度提升决策树 | F1分数 | 0.87 | ||||
K近邻 | F1分数 | 0.86 | ||||
随机梯度下降 | F1分数 | 0.84 | ||||
随机森林 | F1分数 | 0.83 | ||||
肝硬化 | 232 | 118 | 114 | 随机森林 | AUC | 0.95 |
支持向量机 | AUC | 0.92 | ||||
弹性网 | AUC | 0.91 | ||||
套索 | AUC | 0.88 | ||||
结直肠癌 | 121 | 48 | 73 | 随机森林 | AUC | 0.87 |
支持向量机 | AUC | 0.81 | ||||
弹性网 | AUC | 0.79 | ||||
套索 | AUC | 0.73 | ||||
肥胖 | 253 | 164 | 89 | 随机森林 | AUC | 0.66 |
支持向量机 | AUC | 0.65 | ||||
弹性网 | AUC | 0.64 | ||||
套索 | AUC | 0.60 | ||||
炎症性肠病 | 110 | 25 | 85 | 随机森林 | AUC | 0.89 |
支持向量机 | AUC | 0.86 | ||||
弹性网 | AUC | 0.83 | ||||
套索 | AUC | 0.81 | ||||
胆管炎 | 48 | 24 | 24 | 随机森林 | AUC | 0.74 |
口臭 | 90 | 45 | 45 | 深度学习 | AUC | 0.97 |
支持向量机 | AUC | 0.79 | ||||
肠息肉 | 552 | 316 | 236 | 朴素贝叶斯 | AUC | 0.86 |
人工神经网络 | AUC | 0.87 |
表1 机器学习不同疾病预测所使用算法及预测精度示例
Table 1 Examples of algorithms and prediction accuracy of different diseases predicted by machine learning
疾病类型 | 样本数 | 负样本数 | 正样本数 | 算法类型 | 评价标准 | 预测精度 |
---|---|---|---|---|---|---|
2型糖尿病 | 344 | 170 | 174 | 随机森林 | AUC | 0.74 |
支持向量机 | AUC | 0.66 | ||||
弹性网 | AUC | 0.70 | ||||
套索 | AUC | 0.71 | ||||
806 | 423 | 383 | 逻辑回归 | F1分数 | 0.91 | |
支持向量机 | F1分数 | 0.91 | ||||
自适应提升 | F1分数 | 0.90 | ||||
梯度提升决策树 | F1分数 | 0.87 | ||||
K近邻 | F1分数 | 0.86 | ||||
随机梯度下降 | F1分数 | 0.84 | ||||
随机森林 | F1分数 | 0.83 | ||||
肝硬化 | 232 | 118 | 114 | 随机森林 | AUC | 0.95 |
支持向量机 | AUC | 0.92 | ||||
弹性网 | AUC | 0.91 | ||||
套索 | AUC | 0.88 | ||||
结直肠癌 | 121 | 48 | 73 | 随机森林 | AUC | 0.87 |
支持向量机 | AUC | 0.81 | ||||
弹性网 | AUC | 0.79 | ||||
套索 | AUC | 0.73 | ||||
肥胖 | 253 | 164 | 89 | 随机森林 | AUC | 0.66 |
支持向量机 | AUC | 0.65 | ||||
弹性网 | AUC | 0.64 | ||||
套索 | AUC | 0.60 | ||||
炎症性肠病 | 110 | 25 | 85 | 随机森林 | AUC | 0.89 |
支持向量机 | AUC | 0.86 | ||||
弹性网 | AUC | 0.83 | ||||
套索 | AUC | 0.81 | ||||
胆管炎 | 48 | 24 | 24 | 随机森林 | AUC | 0.74 |
口臭 | 90 | 45 | 45 | 深度学习 | AUC | 0.97 |
支持向量机 | AUC | 0.79 | ||||
肠息肉 | 552 | 316 | 236 | 朴素贝叶斯 | AUC | 0.86 |
人工神经网络 | AUC | 0.87 |
1 | Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome[J]. Nature, 2012, 486(7402): 207-214. |
2 | FALONY G, JOOSSENS M, VIEIRA-SILVA S, et al.. Population-level analysis of gut microbiome variation[J]. Science, 2016, 352(6285): 560-564. |
3 | HE Y, WU W, ZHENG H M, et al.. Regional variation limits applications of healthy gut microbiome reference ranges and disease models[J]. Nat. Med., 2018, 24(10): 1532-1535. |
4 | NAJAFABADI M M, VILLANUSTRE F, KHOSHGOFTAAR T M, et al.. Deep learning applications and challenges in big data analytics[J]. J. Big Data, 2015, 2(1): 1-21. |
5 | HERNÁNDEZ MEDINA R, KUTUZOVA S, NIELSEN K N, et al.. Machine learning and deep learning applications in microbiome research[J]. ISME Commun., 2022, 2(1): 1-7. |
6 | COVER T, HART P. Nearest neighbor pattern classification[J]. IEEE Transac. Inform. Theory, 1967, 13(1): 21-27. |
7 | CORTES C, VAPNIK V. Support-vector networks[J]. Mach. Learn., 1995, 20(3): 273-297. |
8 | HACLLAR H, NALBANTOĞLU O U, BAKIR-GÜNGÖR B. Machine learning analysis of inflammatory bowel disease-associated metagenomics dataset[C]//2018 3rd International Conference on Computer Science and Engineering (UBMK). IEEE, 2018: 434-438. |
9 | ASSEGIE T A. Support vector machine and k-nearest neighbor based liver disease classification model[J]. Indonesian J. Electr. Engin. Med. Inform., 2021, 3(1): 9-14. |
10 | LIU W, FANG X, ZHOU Y, et al.. Machine learning-based investigation of the relationship between gut microbiome and obesity status[J/OL]. Microbes Infect., 2022, 24(2): 104892[2022-05-04]. . |
11 | REIMAN D, METWALLY A, DAI Y. Using convolutional neural networks to explore the microbiome[J]. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., 2017, 2017: 4269-4272. |
12 | NASSER I M, ABU-NASER S S. Lung cancer detection using artificial neural network[J]. Int. J. Engin. Inform. Systems, 2019, 3(3): 17-23. |
13 | LYNGDOH A C, CHOUDHURY N A, MOULIK S. Diabetes disease prediction using machine learning algorithms[C]//2020 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES). IEEE, 2021: 517-521. |
14 | GILL S R, POP M, DEBOY R T, et al.. Metagenomic analysis of the human distal gut microbiome[J]. Science, 2006, 312(5778): 1355-1359. |
15 | XU J, GORDON J I. Honor thy symbionts[J]. Proc. Natl. Acad. Sci. USA, 2003, 100(18): 10452-10459. |
16 | FRANK D N, AMAND A L, FELDMAN R A, et al.. Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases[J]. FEMS Microbiol. Ecol., 2007, 104(34): 13780-13785. |
17 | ZHANG X, ZHAO S, SONG X, et al.. Inhibition effect of glycyrrhiza polysaccharide (GCP) on tumor growth through regulation of the gut microbiota composition[J]. J. Pharmacol. Sci., 2018, 137(4): 324-332. |
18 | O'HARA A M, SHANAHAN F. The gut flora as a forgotten organ[J]. EMBO Rep., 2006, 7(7): 688-693. |
19 | SCHLOSS P D, HANDELSMAN J. Status of the microbial census[J]. Microbiol. Mol. Biol. Rev., 2004, 68(4): 686-691. |
20 | MENG C, BAI C, BROWN T D, et al.. Human gut microbiota and gastrointestinal cancer[J]. Genom. Proteom. Bioinform., 2018, 16(1): 33-49. |
21 | CARDING S, VERBEKE K, VIPOND D T, et al.. Dysbiosis of the gut microbiota in disease[J/OL]. Microb. Ecol. Health Dis., 2015, 26: 26191[2022-05-04]. . |
22 | HENNESSY A A, ROSS R P, FITZGERALD G F, et al.. Role of the gut in modulating lipoprotein metabolism[J/OL]. Curr. Cardiol. Rep., 2014, 16(8): 515[2022-05-04]. . |
23 | CHELAKKOT C, GHIM J, RYU S H. Mechanisms regulating intestinal barrier integrity and its pathological implications[J]. Exp. Mol. Med., 2018, 50(8): 1-9. |
24 | WALSH C J, GUINANE C M, O'TOOLE P W, et al.. Beneficial modulation of the gut microbiota[J]. FEBS Lett., 2014, 588(22): 4120-4130. |
25 | WANG J, TANG H, ZHANG C, et al.. Modulation of gut microbiota during probiotic-mediated attenuation of metabolic syndrome in high fat diet-fed mice[J]. ISME J., 2015, 9(1): 1-15. |
26 | RODRÍGUEZ J M, MURPHY K, STANTON C, et al.. The composition of the gut microbiota throughout life, with an emphasis on early life[J/OL]. Microb. Ecol. Health Dis., 2015, 26: 26050[2022-05-04]. . |
27 | LEPAGE P, COLOMBET J, MARTEAU P, et al.. Dysbiosis in inflammatory bowel disease: a role for bacteriophages?[J]. Gut, 2008, 57(3): 424-425. |
28 | MÄTTÖ J, MAUNUKSELA L, KAJANDER K, et al.. Composition and temporal stability of gastrointestinal microbiota in irritable bowel syndrome: a longitudinal study in IBS and control subjects[J]. FEMS Immunol. Med. Microbiol., 2005, 43(2): 213-222. |
29 | KEKU T O, DULAL S, DEVEAUX A, et al.. The gastrointestinal microbiota and colorectal cancer[J]. Am. J. Physiol. Gastrointest. Liver Physiol., 2015, 308(5): 351-363. |
30 | ECK A, DE GROOT E F J, DE MEIJ T G J, et al.. Robust microbiota-based diagnostics for inflammatory bowel disease[J]. J. Clin. Microbiol., 2017, 55(6): 1720-1732. |
31 | KANG D W, PARK J G, ILHAN Z E, et al.. Reduced incidence of Prevotella and other fermenters in intestinal microflora of autistic children[J/OL]. PLoS ONE, 2013, 8(7): e68322[2022-05-04]. . |
32 | SON J S, ZHENG L J, ROWEHL L M, et al.. Comparison of fecal microbiota in children with autism spectrum disorders and neurotypical siblings in the Simons simplex collection[J/OL]. PLoS ONE, 2015, 10(10): e0137725[2022-05-04]. . |
33 | ANGELAKIS E, ARMOUGOM F, MILLION M, et al.. The relationship between gut microbiota and weight gain in humans[J]. Future Microbiol., 2012, 7(1): 91-109. |
34 | QIN J, LI Y, CAI Z, et al.. A metagenome-wide association study of gut microbiota in type 2 diabetes[J]. Nature, 2012, 490(7418): 55-60. |
35 | 张国庆,黄子琪,王明月,等. 大学生饮食习惯与唾液微生物多样性的关联[J].食品科学, 2019,40(1): 196-201. |
36 | LI L, MINGLE D R. Mini review adv biotech & micro machine learning techniques on microbiome-based diagnostics[J]. Adv. Biotechnol. Microbiol., 2017, 6(4): 555695[2022-05-04]. . |
37 | CAMMAROTA G, IANIRO G, AHERN A, et al.. Gut microbiome, big data and machine learning to promote precision medicine for cancer[J]. Nat. Rev. Gastroenterol. Hepatol., 2020, 17(10): 635-648. |
38 | FRADKOV A. Early history of machine learning[J]. IFAC-Papers, 2020, 53(2): 1385-1390. |
39 | ZHOU Y H, GALLINS P. A review and tutorial of machine learning methods for microbiome host trait prediction[J/OL]. Front. Genet., 2019, 10: 579[[2022-05-04]. . |
40 | ZHANG Y, YAN J, CHEN S, et al.. Review of the applications of deep learning in bioinformatics[J].Curr. Bioinform., 2020, 15(8):1-14. |
41 | DAVENPORT T, KALAKOTA R. The potential for artificial intelligence in healthcare[J]. Future Healthc. J., 2019, 6(2): 94-98. |
42 | VUJKOVIC-CVIJIN I, SKLAR J, JIANG L, et al.. Host variables confound gut microbiota studies of human disease[J]. Nature, 2020, 587(7834): 448-454. |
43 | CAMACHO D M, COLLINS K M, POWERS R K, et al.. Next-generation machine learning for biological networks[J]. Cell, 2018, 173(7): 1581-1592. |
44 | MAHESH B. Machine learning algorithms-a review[J]. Int. J. Sci. Res., 2020, 9: 381-386. |
45 | XU L, LIANG G, LIAO C, et al.. An efficient classifier for Alzheimer's disease genes identification[J/OL]. Molecules, 2018, 23(12): 3140[2022-05-04]. . |
46 | KUNG H C, CHEN R M, TSAI J J P, et al.. Stratification of human gut microiome and building a SVM-based classifier[C]//2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE). IEEE, 2018: 14-17. |
47 | ALTY S, MILLASSEAU S, CHOWIENCZYC P J, et al. Cardiovascular disease prediction using support vector machines[J]. Midwest Symp. Circuits Syst., 2004, 1: 376-379. |
48 | WU H, CAI L, LI D, et al. Metagenomics biomarkers selected for prediction of three different diseases in Chinese population[J]. BioMed. Res. Int., 2018, 2018: 1-7. |
49 | YAO Q, TANG M, ZENG L, et al.. Potential of fecal microbiota for detection and postoperative surveillance of colorectal cancer[J/OL]. BMC Microbiol., 2021, 21(1): 156[2022-05-04]. . |
50 | LI H, PI D, WU Y, et al.. Integrative method based on linear regression for the prediction of zinc-binding sites in proteins[J/OL]. IEEE Access, 2017, PP(99): 1[2022-05-04]. . |
51 | STATNIKOV A, HENAFF M, NARENDRA V, et al.. A comprehensive evaluation of multicategory classification methods for microbiomic data[J/OL]. Microbiome, 2013, 1(1): 11[2022-05-04]. . |
52 | PASOLLI E, TRUONG D T, MALIK F, et al.. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights[J/OL]. PLoS Comput. Biol., 2016, 12(7): e1004977[2022-05-04]. . |
53 | YANG L, WU H, JIN X, et al.. Study of cardiovascular disease prediction model based on random forest in Eastern China[J/OL]. Sci. Rep., 2020, 10(1): 5245[2022-05-04]. . |
54 | TEJAMMA M, NAVEENKUMAR J P, PATIL S. A model based on convolutional neural network (CNN) to predict heart disease[J]. J. Algeb. Statist., 2022, 13(3): 2360-2367. |
55 | WEHKAMP J, HARDER J, WEHKAMP K, et al.. NF-kappaB- and AP-1-mediated induction of human beta defensin-2 in intestinal epithelial cells by Escherichia coli Nissle 1917: a novel effect of a probiotic bacterium[J]. Infect. Immun., 2004, 72(10): 5750-5758. |
56 | SCHAEDLER R W, DUBOS R, COSTELLO R. The development of the bacterial flora in the gastrointestinal tract of mice[J]. J. Exp. Med., 1965, 122(1): 59-66. |
57 | MAZMANIAN S K, LIU C H, TZIANABOS A O, et al.. An immunomodulatory molecule of symbiotic bacteria directs maturation of the host immune system[J]. Cell, 2005, 122(1): 107-118. |
58 | 刘驰, 李家宝, 芮俊鹏, 等. 16S rRNA基因在微生物生态学中的应用[J]. 生态学报, 2015, 35(9): 2769-2788. |
59 | CONSORTIUM H M P, HUTTENHOWER C, GEVERS D, et al.. Structure, function and diversity of the healthy human microbiome[J]. Nature, 2012, 486(7402): 207-214. |
60 | MCDONALD D, HYDE E, DEBELIUS J W, et al.. American gut: an open platform for citizen science microbiome research[J]. Microorganisms, 2018, 3(3): e00031-e00018. |
61 | PFLUGHOEFT K J, VERSALOVIC J. Human microbiome in health and disease[J]. Annu. Rev. Pathol., 2012, 7: 99-122. |
62 | VALDES A M, WALTER J, SEGAL E, et al.. Role of the gut microbiota in nutrition and health[J/OL]. Brithish Med. J., 2018, 361: k2179[2022-05-04]. . |
63 | WONG A C, LEVY M. New approaches to microbiome-based therapies[J]. mSystems, 2019, 4(3): 119-122. |
64 | SCHLABERG R. Microbiome diagnostics[J]. Clin. Chem., 2020, 66(1): 68-76. |
65 | SCHLOSS P D. Identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability in microbiome research[J]. mBio, 2018, 9(3): 518-525. |
66 | MCLAREN M R, WILLIS A D, CALLAHAN B J. Consistent and correctable bias in metagenomic sequencing experiments[J/OL]. eLife, 2019, 8: e46923[2022-05-04]. . |
67 | QIN N, YANG F, LI A, et al.. Alterations of the human gut microbiome in liver cirrhosis[J]. Nature, 2014, 513(7516): 59-64. |
68 | IWASAWA K, SUDA W, TSUNODA T, et al.. Dysbiosis of the salivary microbiota in pediatric-onset primary sclerosing cholangitis and its potential as a biomarker[J/OL]. Sci. Rep., 2018, 8(1): 5480[2022-05-04]. . |
69 | SARWAR A, JAVED K, KHAN M J, et al.. Enhanced accuracy for motor imagery detection using deep learning for BCI[J]. Comp. Mater. Contin., 2021(9): 3825-3840. |
70 | DADKHAH E, SIKAROODI M, KORMAN L, et al.. Gut microbiome identifies risk for colorectal polyps[J/OL]. BMJ Open Gastroenterol., 2019, 6(1): e000297[2022-05-04]. . |
71 | OSISANWO F Y, AKINSOLA J E T, AWODELE O, et al.. Supervised machine learning algorithms: classification and comparison[J]. Int. J. Comp. Trends Technol., 2017, 48(3): 128-138. |
72 | LIVINGSTONE D J, MANALLACK D T, TETKO I V. Data modelling with neural networks: advantages and limitations[J]. J. Comput. Aided Mol. Des., 1997, 11(2): 135-142. |
73 | LAMICHHANE S, SEN P, DICKENS A M, et al.. Gut metabolome meets microbiome: a methodological perspective to understand the relationship between host and microbe[J]. Methods, 2018, 149: 3-12. |
74 | KUANG X, WANG F, HERNANDEZ K M, et al.. Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN[J/OL]. Sci. Rep., 2022, 12(1): 2427[2022-05-04]. . |
[1] | 曹海涛, 朱静, 曾海波, 刘彦辰. 基于加权平均的肠道菌群特征筛选和疾病预测模型研究[J]. 生物技术进展, 2023, 13(5): 798-806. |
[2] | 马云鹏, 朱静, 崔兴华. 基于机器学习的微生物溶解有机碳含量估测[J]. 生物技术进展, 2023, 13(4): 645-653. |
[3] | 白亮, 黄鹤, 王苹. 合成生物学在治疗代谢性疾病中的研究进展[J]. 生物技术进展, 2023, 13(3): 383-389. |
[4] | 苗瑞菊, 丁尊丹, 田健, 张红兵, 关菲菲. PET水解酶传统与智能分子设计研究进展[J]. 生物技术进展, 2023, 13(1): 46-54. |
[5] | 刘梓嘉, 姜雪, 仪杨, 王濛, 马晨, 宋怡菲, 谢飞. 氢气与肠道菌群的关系研究进展[J]. 生物技术进展, 2022, 12(6): 847-852. |
[6] | 王濛, 仪杨, 孙梦婷, 刘梓嘉, 姜雪, 马晨, 宋怡菲, 谢飞. 富氢水和富氢生理盐水生物医学研究进展——动物实验[J]. 生物技术进展, 2022, 12(3): 332-343. |
[7] | 丁宁, 许叶, 曾玮思, 胡彦周, 洪凌宇, 黄昆仑, 贺晓云. 重组人乳铁蛋白和重组人溶菌酶对小鼠溃疡性结肠炎的改善作用研究[J]. 生物技术进展, 2022, 12(1): 120-128. |
[8] | 辛志奇, 赵航, 汪海, 路铁刚. 基于深度学习的作物基因组学和遗传改良[J]. 生物技术进展, 2021, 11(4): 483-488. |
[9] | 谢亚东,解明旭,李解,王安然,杨培龙,冉超,周志刚. 无菌斑马鱼感染鲤春病毒血症病毒模型的建立[J]. 生物技术进展, 2019, 9(4): 369-374. |
[10] | 王华,王志红. 大鼠慢性束缚应激后肠道病理及其菌群变化[J]. 生物技术进展, 2017, 7(1): 52-57. |
[11] | 朱晓慧,张成岗,刘海峰,. 急性应激后大鼠胃肠道病理变化及其菌群ERIC-PCR图谱分析[J]. 生物技术进展, 2016, 6(3): 200-205. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
版权所有 © 2021《生物技术进展》编辑部