Ecology or genetics? Adapting machine learning approaches to understand determinants of cross-species transmission and virulence in RNA viruses
生态学还是遗传学?
基本信息
- 批准号:MR/T027355/1
- 负责人:
- 金额:$ 30.03万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Fellowship
- 财政年份:2019
- 资助国家:英国
- 起止时间:2019 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Emerging infectious diseases from animal sources continue to threaten human health, exemplified by the spread and severe disease of recent Ebola virus, Zika virus and MERS coronavirus outbreaks. The WHO has noted the serious possibility of a new emerging pathogen to cause a public health crises, denoting this 'Disease X'.'Disease X' is most likely to be caused by an RNA virus, as they evolve faster and are more likely to emerge and infect humans than other pathogens. Zoonotic viruses (i.e. those that transmit cross-species from non-human animals to humans) are also known to have higher emergence risks.Although some zoonotic viruses cause severe and life-threatening illness upon infecting humans, others appear to cause mild disease or no disease at all. To produce early predictions of the public health impacts of 'Disease X', it is essential to identify which factors drive this variation in 'virulence' (i.e. how severe disease outcomes are). However, we currently only have a poor understanding of which factors drive infection and virulence risk in cross-species transmissions, partly because of the lack of available risk factor information. The traditional approach is to identify ecological risk factors using classical statistical models, though these models are often too reductionist to capture the complex evolutionary patterns behind emergence.Additionally, the ease of modern RNA sequencing has led to a much wider availability of large genetic data resources for viruses. Genetic patterns or 'motifs' recur throughout virus sequences, with certain motifs recurring more often within infections of certain hosts, which may aid virus replication or evasion of the immune system. Genetic sequences could therefore hold important signals towards predicting infection or virulence within a new host after cross-species transmission. However, finding practical ways of capturing motifs for predictive modelling has proven challenging due to the large volumes of potential information within sequence data. The central goal of this research is to combine both ecology and genetics to improve predictions of which animal viruses pose the greatest risks of emergence and severe disease in humans. To unlock this potential power in RNA virus genetic sequences, new analytical approaches are needed. I will apply machine learning as a state-of-the-art modelling method. Machine learning models can predict outcomes based on large sets of highly diverse predictors and complex interactions. These models will allow me to identify key genetic motifs influencing cross-species transmission and directly compare genetic and ecological traits. To improve predictive performance, I will compare a range of machine learning algorithms (e.g., classification and regression trees, support vector machines) and approaches (e.g. 'bagging', aggregation over many individual models; and 'boosting', allowing models to gradually learn).This research will identify patterns across all known mammal and bird RNA viruses by using the exceptional breadth of data within the Enhanced Infectious Disease Database (EID), developed at the University of Liverpool. EID2 contains infection data from 29,500 host-pathogen pairs, automatically collected from genetic records (GenBank) and scientific literature texts (PubMed). Despite virulence being a key virus trait, no comparably-sized resources exist describing disease outcomes. Extending on the EID2 platform, I will develop automated text mining tools to capture data on disease outcomes in different hosts from scientific texts describing experimental infections.This research will test evolutionary theory across a large diversity of RNA viruses. The proposed machine learning models will inform public health risk assessment by improving our capacity to predict emergence and suggesting strategic target viruses or hosts for preventing future disease outbreaks from cross-species transmission.
动物来源的新出现的传染病继续威胁人类健康,最近埃博拉病毒、寨卡病毒和MERS冠状病毒爆发的传播和严重疾病就是例证。世界卫生组织注意到一种新出现的病原体极有可能引发公共卫生危机,将其称为“X病”。“X病”最有可能是由RNA病毒引起的,因为它们进化得更快,比其他病原体更容易出现并感染人类。人畜共患病病毒(即那些从非人类动物向人类传播跨物种的病毒)也有较高的出现风险。尽管一些人畜共患病病毒在感染人类时会导致严重和危及生命的疾病,但其他病毒似乎会引起轻微的疾病或根本没有疾病。为了及早预测“X病”对公众健康的影响,至关重要的是确定哪些因素推动了这种“毒性”的变化(即疾病后果有多严重)。然而,我们目前对哪些因素导致跨物种传播中的感染和毒力风险了解很少,部分原因是缺乏可用的风险因素信息。传统的方法是使用经典的统计模型来识别生态风险因素,尽管这些模型往往过于简单化,无法捕捉到突发事件背后的复杂进化模式。此外,现代RNA测序的简便性导致了更广泛的病毒遗传数据资源的可用。遗传模式或‘基序’在整个病毒序列中重复出现,某些基序在某些宿主的感染中更频繁地重复出现,这可能有助于病毒复制或逃避免疫系统。因此,基因序列可以为预测跨物种传播后新宿主内的感染或毒力提供重要信号。然而,由于序列数据中有大量的潜在信息,寻找捕捉基序用于预测建模的实用方法被证明是具有挑战性的。这项研究的中心目标是将生态学和遗传学结合起来,以改进对哪些动物病毒对人类造成最大的出现和严重疾病风险的预测。为了在RNA病毒基因序列中释放这种潜在的力量,需要新的分析方法。我将应用机器学习作为一种最先进的建模方法。机器学习模型可以基于大量高度不同的预测者和复杂的交互来预测结果。这些模型将使我能够识别影响跨物种传播的关键遗传基序,并直接比较遗传和生态特征。为了提高预测性能,我将比较一系列机器学习算法(例如,分类和回归树、支持向量机)和方法(例如,在许多单独的模型上进行打包,聚合;并使用Boosting,允许模型逐渐学习)。这项研究将使用利物浦大学开发的增强型传染病数据库(EID)中的异常广泛的数据,识别所有已知哺乳动物和鸟类RNA病毒的模式。EID2包含来自29,500对宿主-病原体的感染数据,这些数据是从遗传记录(GenBank)和科学文献文本(PubMed)自动收集的。尽管毒力是病毒的一个关键特征,但没有类似规模的资源来描述疾病结果。我将在EID2平台上扩展,开发自动文本挖掘工具,从描述实验性感染的科学文本中捕获不同宿主的疾病后果数据。这项研究将在大量不同的RNA病毒中测试进化理论。拟议的机器学习模型将通过提高我们预测疫情的能力,并为防止未来疾病跨物种传播的战略目标病毒或宿主提供建议,为公共卫生风险评估提供信息。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Tracking changes between preprint posting and journal publication during a pandemic.
- DOI:10.1371/journal.pbio.3001285
- 发表时间:2022-03
- 期刊:
- 影响因子:9.8
- 作者:Brierley L;Nanni F;Polka JK;Dey G;Pálfy M;Fraser N;Coates JA
- 通讯作者:Coates JA
Past and future uses of text mining in ecology and evolution.
- DOI:10.1098/rspb.2021.2721
- 发表时间:2022-05-25
- 期刊:
- 影响因子:4.7
- 作者:Farrell, Maxwell J.;Brierley, Liam;Willoughby, Anna;Yates, Andrew;Mideo, Nicole
- 通讯作者:Mideo, Nicole
The Global Virome in One Network (VIRION): an atlas of vertebrate-virus associations
全球病毒组一体化网络 (VIRION):脊椎动物病毒关联图谱
- DOI:10.1101/2021.08.06.455442
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Carlson C
- 通讯作者:Carlson C
The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape.
- DOI:10.1371/journal.pbio.3000959
- 发表时间:2021-04
- 期刊:
- 影响因子:9.8
- 作者:Fraser N;Brierley L;Dey G;Polka JK;Pálfy M;Nanni F;Coates JA
- 通讯作者:Coates JA
The changing landscape of text mining - a review of approaches for ecology and evolution
文本挖掘不断变化的格局——生态学和进化方法综述
- DOI:10.32942/x2vg87
- 发表时间:2024
- 期刊:
- 影响因子:0
- 作者:Farrell M
- 通讯作者:Farrell M
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Liam Brierley其他文献
Ecology of emerging diseases: virulence and transmissibility of human RNA viruses
- DOI:
- 发表时间:
2017-07 - 期刊:
- 影响因子:0
- 作者:
Liam Brierley - 通讯作者:
Liam Brierley
The science of the host–virus network
宿主-病毒网络科学
- DOI:
10.1038/s41564-021-00999-5 - 发表时间:
2021-11-24 - 期刊:
- 影响因子:19.400
- 作者:
Gregory F. Albery;Daniel J. Becker;Liam Brierley;Cara E. Brook;Rebecca C. Christofferson;Lily E. Cohen;Tad A. Dallas;Evan A. Eskew;Anna Fagre;Maxwell J. Farrell;Emma Glennon;Sarah Guth;Maxwell B. Joseph;Nardus Mollentze;Benjamin A. Neely;Timothée Poisot;Angela L. Rasmussen;Sadie J. Ryan;Stephanie Seifert;Anna R. Sjodin;Erin M. Sorrell;Colin J. Carlson - 通讯作者:
Colin J. Carlson
The role of research preprints in the academic response to the COVID-19 epidemic
研究预印本在学术应对 COVID-19 流行病中的作用
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Liam Brierley - 通讯作者:
Liam Brierley
Preprinting the COVID-19 pandemic
预印 COVID-19 大流行
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Nicholas Fraser;Liam Brierley;Gautam Dey;Jessica K. Polka;M. Pálfy;F. Nanni;J. A. Coates - 通讯作者:
J. A. Coates
Predicting high confidence ctDNA somatic variants with ensemble machine learning models
- DOI:
10.1038/s41598-025-01326-2 - 发表时间:
2025-05-26 - 期刊:
- 影响因子:3.900
- 作者:
Rugare Maruzani;Liam Brierley;Andrea Jorgensen;Anna Fowler - 通讯作者:
Anna Fowler
Liam Brierley的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Liam Brierley', 18)}}的其他基金
Predicting emergence risk of future zoonotic viruses through computational learning
通过计算学习预测未来人畜共患病毒的出现风险
- 批准号:
MR/X019616/1 - 财政年份:2024
- 资助金额:
$ 30.03万 - 项目类别:
Fellowship
相似国自然基金
Journal of Genetics and Genomics
- 批准号:31224803
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
双相情感障碍的基因多态性的关联研究
- 批准号:81101008
- 批准年份:2011
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
调控TLRs信号通路候选miRNAs靶基因3'UTR内SNPs对口腔鳞状细胞癌发病的影响及其后续功能分析
- 批准号:81001208
- 批准年份:2010
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
精神分裂症脑网络异常的影像遗传学研究
- 批准号:81000582
- 批准年份:2010
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
精神分裂症与吸烟关联的分子遗传学机制研究
- 批准号:81000579
- 批准年份:2010
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
中国竹叶青蛇属Viridovipera的分子系统与形态进化
- 批准号:30970334
- 批准年份:2009
- 资助金额:8.0 万元
- 项目类别:面上项目
FcγR基因拷贝数和狼疮性肾炎相关研究
- 批准号:30801022
- 批准年份:2008
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
智力超常儿童的基因分型的初步研究
- 批准号:30670716
- 批准年份:2006
- 资助金额:30.0 万元
- 项目类别:面上项目
鸡脂肪组织生长发育的分子遗传学基础
- 批准号:30430510
- 批准年份:2004
- 资助金额:130.0 万元
- 项目类别:重点项目
相似海外基金
Adapting machine learning methods to detect genetic loci specific to strictly defined MDD
采用机器学习方法来检测严格定义的 MDD 特有的遗传位点
- 批准号:
10196078 - 财政年份:2021
- 资助金额:
$ 30.03万 - 项目类别:
Adapting machine learning methods to detect genetic loci specific to strictly defined MDD
采用机器学习方法来检测严格定义的 MDD 特有的遗传位点
- 批准号:
10378100 - 财政年份:2021
- 资助金额:
$ 30.03万 - 项目类别:
Adapting Secretory Proteostasis through Pharmacologic IRE1 Activation
通过药物 IRE1 激活来适应分泌蛋白稳态
- 批准号:
9760934 - 财政年份:2019
- 资助金额:
$ 30.03万 - 项目类别:
NeuroLab: Adapting an authentic ISE experience for high school course integration and positive STEM outcomes
NeuroLab:采用真实的 ISE 体验来整合高中课程并取得积极的 STEM 成果
- 批准号:
10456118 - 财政年份:2019
- 资助金额:
$ 30.03万 - 项目类别:
NeuroLab: Adapting an authentic ISE experience for high school course integration and positive STEM outcomes
NeuroLab:采用真实的 ISE 体验来整合高中课程并取得积极的 STEM 成果
- 批准号:
10208907 - 财政年份:2019
- 资助金额:
$ 30.03万 - 项目类别:
NeuroLab: Adapting an authentic ISE experience for high school course integration and positive STEM outcomes
NeuroLab:采用真实的 ISE 体验来整合高中课程并取得积极的 STEM 成果
- 批准号:
10668424 - 财政年份:2019
- 资助金额:
$ 30.03万 - 项目类别:
Deaf ACCESS: Adapting Consent through Community Engagement and State-of-the-art Simulation
聋人访问:通过社区参与和最先进的模拟调整同意
- 批准号:
9318498 - 财政年份:2016
- 资助金额:
$ 30.03万 - 项目类别:
Causes and consequences of gene copy number change in adapting yeast populations
适应酵母种群时基因拷贝数变化的原因和后果
- 批准号:
8726426 - 财政年份:2011
- 资助金额:
$ 30.03万 - 项目类别:
Causes and consequences of gene copy number change in adapting yeast populations
适应酵母种群时基因拷贝数变化的原因和后果
- 批准号:
8526477 - 财政年份:2011
- 资助金额:
$ 30.03万 - 项目类别:
Causes and consequences of gene copy number change in adapting yeast populations
适应酵母种群时基因拷贝数变化的原因和后果
- 批准号:
8183586 - 财政年份:2011
- 资助金额:
$ 30.03万 - 项目类别: