Tractable statistical inference from genomic data using diffusion models

使用扩散模型从基因组数据中进行易于处理的统计推断

基本信息

  • 批准号:
    EP/L018497/1
  • 负责人:
  • 金额:
    $ 11.84万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2014
  • 资助国家:
    英国
  • 起止时间:
    2014 至 无数据
  • 项目状态:
    已结题

项目摘要

If we were to obtain the sequence of my genome, we'd see a string of three billion letters from the DNA alphabet. Now let's sequence yours and compare the two. At most positions they'd be identical - we are both human - but at a small number, less than 1%, we'd seem some variation. As the costs of DNA sequencing fall, obtaining your own genome will soon no longer be a hypothetical question. We are presently on the cusp of obtaining the genomes of thousands of people, providing a glimpse into the complex pattern of genetic variation across all humans. This data encodes a great deal of biological information such as the rate of mutations, and it also contains information about human demographic history, such as recent historical population size changes and migrations. Can we infer these things just from the genetic data?Given its rich and complex source of data, this has occupied statisticians, probabilists, and geneticists for many years. The key to this type of statistical inference is a suitable stochastic model: one important model is known as the Wright-Fisher diffusion. It describes the random fluctuations through time of the frequency of a variant in a large population - that is, it traces a trajectory for how prevalent the variant was at each point in time. Performing inference with diffusion models can be difficult. The purpose of this research is to contribute to making such inference tractable.The approach here is to use a computationally-intensive, simulation-based, statistical technique: rather than work exhaustively, we simulate some random, representative samples from the model and average over them. A computer can provide us with a large number of samples, so that the error is expected to be small provided we wait long enough. So successful is this idea that it is used throughout science and engineering. Here, we must simulate paths from the Wright-Fisher diffusion - the random, unobserved trajectories of historical frequencies of genetic variants. Ensuring such simulation can be carried out efficiently on this and related diffusions is a first task of the research. Because of the generality of the models and the techniques involved, this has the potential to aid researchers in many fields outside genetics too.Given a method for sampling from the model, our next task is to embed it into an inference algorithm. However, this approach has been little applied to the framework of the Wright-Fisher diffusion, and there are open questions on the design of such an algorithm that this research will address, including some important specific issues. For example, we might simulate our diffusion path by many small, local increments, building up its trajectory in very small time steps based on what the data looks like at that time. We should hope that these trajectories will be consistent with the observed data overall, but ensuring such consistency is a global, not local, problem. The project will also address this issue.Finally, we must specialize the algorithms for the analysis of genetic data. So that the work can be made accessible, convenient software will also be developed. Analysis of genetic data has the potential to provide a range of benefits: among other things, we can learn about human origins from ancient DNA, the evolution of pathogens, the progression of a tumour, the importance of natural selection, and the recent demographic history of humans. The latter is important as a vital first step in predicting the nature of human genetic variation, which in turn is fundamental in our understanding of the genetic basis of the risk of many complex diseases.
如果我们要获得我的基因组序列,我们会看到DNA字母表中的30亿个字母串。现在让我们对您的代码进行排序,并将两者进行比较。在大多数情况下,它们是相同的--我们都是人类--但在一小部分,不到1%的情况下,我们看起来有些不同。随着DNA测序成本的下降,获得自己的基因组很快将不再是一个假设的问题。我们目前正在获取数千人的基因组,这让我们得以一窥全人类基因变异的复杂模式。这些数据编码了大量的生物信息,如突变的速度,它还包含有关人类人口历史的信息,如最近历史上的人口规模变化和迁徙。我们能仅仅从基因数据中推断出这些事情吗?鉴于其丰富而复杂的数据来源,这已经占据了统计学家、概率学家和遗传学家多年的时间。这类统计推断的关键是一个合适的随机模型:一个重要的模型被称为Wright-Fisher扩散。它描述了一种变异在大量人群中的频率随时间的随机波动--也就是说,它追踪了该变异在每个时间点上的流行程度的轨迹。使用扩散模型进行推理可能会很困难。这项研究的目的是为了使这类推理变得容易处理。这里的方法是使用计算密集型的、基于模拟的统计技术:我们不是穷尽工作,而是从模型中模拟一些随机的、有代表性的样本,并对它们进行平均。计算机可以为我们提供大量的样本,所以只要我们等待足够长的时间,误差就会很小。这一想法如此成功,以至于它被应用于整个科学和工程领域。在这里,我们必须模拟莱特-费舍尔扩散的路径--遗传变异历史频率的随机、未观察到的轨迹。确保能够有效地对这种扩散和相关扩散进行这种模拟是研究的首要任务。由于模型的通用性和涉及的技术,这也有可能帮助遗传学以外的许多领域的研究人员。给定从模型中采样的方法,我们的下一个任务是将其嵌入到推理算法中。然而,这种方法很少应用到Wright-Fisher扩散的框架中,而且关于这种算法的设计还存在一些公开的问题,本研究将解决这些问题,包括一些重要的具体问题。例如,我们可以通过许多小的局部增量来模拟我们的扩散路径,根据当时的数据以非常小的时间步长建立它的轨迹。我们应该希望这些轨迹总体上与观察到的数据一致,但确保这种一致性是一个全球性的问题,而不是局部的问题。该项目也将解决这个问题。最后,我们必须使遗传数据分析的算法专业化。为了使工作变得方便,还将开发方便的软件。对遗传数据的分析有可能提供一系列好处:除其他外,我们可以从古代DNA、病原体的进化、肿瘤的发展、自然选择的重要性以及人类最近的人口历史等方面了解人类的起源。后者是预测人类基因变异性质的重要第一步,而人类基因变异的性质反过来又是我们理解许多复杂疾病风险的遗传基础的基础。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
EXACT SIMULATION OF THE WRIGHT-FISHER DIFFUSION
  • DOI:
    10.1214/16-aap1236
  • 发表时间:
    2017-06-01
  • 期刊:
  • 影响因子:
    1.8
  • 作者:
    Jenkins, Paul A.;Spano, Dario
  • 通讯作者:
    Spano, Dario
TRACTABLE DIFFUSION AND COALESCENT PROCESSES FOR WEAKLY CORRELATED LOCI.
  • DOI:
    10.1214/ejp.v20-3564
  • 发表时间:
    2014-05
  • 期刊:
  • 影响因子:
    1.4
  • 作者:
    P. A. Jenkins;P. Fearnhead;Yun S. Song
  • 通讯作者:
    P. A. Jenkins;P. Fearnhead;Yun S. Song
Bayesian nonparametric analysis of Kingman's coalescent
Kingman 合并的贝叶斯非参数分析
Inference and rare event simulation for stopped Markov processes via reverse-time sequential Monte Carlo
  • DOI:
    10.1007/s11222-017-9722-1
  • 发表时间:
    2018-01-01
  • 期刊:
  • 影响因子:
    2.2
  • 作者:
    Koskela, Jere;Spano, Dario;Jenkins, Paul A.
  • 通讯作者:
    Jenkins, Paul A.
Asymptotic genealogies of interacting particle systems with an application to sequential Monte Carlo
相互作用粒子系统的渐近谱系及其在顺序蒙特卡罗中的应用
  • DOI:
    10.48550/arxiv.1804.01811
  • 发表时间:
    2018
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Koskela J
  • 通讯作者:
    Koskela J
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Paul Jenkins其他文献

Validation of Single-Item Screening Measures for Provider Burnout in a Rural Health Care Network
农村卫生保健网络中提供者倦怠单项筛查措施的验证
  • DOI:
    10.1177/0163278715573866
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    2.9
  • 作者:
    Anthony C. Waddimba;M. Scribani;Melinda A. Nieves;N. Krupa;J. May;Paul Jenkins
  • 通讯作者:
    Paul Jenkins
Learning Agile Scrum Methodology Using the Groupware Tool Trello $$^{\textregistered }$$ Through Collaborative Working
使用群件工具 Trello $$^{ extregistered }$$ 通过协作学习学习敏捷 Scrum 方法
  • DOI:
    10.1007/978-3-030-22354-0_31
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    0
  • 作者:
    N. Naik;Paul Jenkins;D. Newell
  • 通讯作者:
    D. Newell
Cough syncope: a complication of adult whooping cough.
咳嗽晕厥:成人百日咳的并发症。
Your Identity is Yours: Take Back Control of Your Identity Using GDPR Compatible Self-Sovereign Identity
您的身份是您的:使用与 GDPR 兼容的自我主权身份收回对您身份的控制
Difficulties encountered in community involvement in delivery under the new South African housing policy
南非新住房政策下社区参与交付遇到的困难
  • DOI:
    10.1016/s0197-3975(99)00017-x
  • 发表时间:
    1999
  • 期刊:
  • 影响因子:
    6.8
  • 作者:
    Paul Jenkins
  • 通讯作者:
    Paul Jenkins

Paul Jenkins的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Paul Jenkins', 18)}}的其他基金

Evaluating the Potential of Community College Guided Pathways Reforms to Increase Undergraduate STEM Student Success
评估社区大学引导路径改革提高本科 STEM 学生成功的潜力
  • 批准号:
    1915191
  • 财政年份:
    2019
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Standard Grant
Automorphic Forms Workshop 2014, May 12-16 2014
自守形式研讨会 2014,2014 年 5 月 12-16 日
  • 批准号:
    1404066
  • 财政年份:
    2014
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Standard Grant
PostDoctoral Research Fellowship
博士后研究奖学金
  • 批准号:
    0603271
  • 财政年份:
    2006
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Fellowship Award

相似国自然基金

基于随机网络演算的无线机会调度算法研究
  • 批准号:
    60702009
  • 批准年份:
    2007
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

CAREER: Statistical foundations of particle tracking and trajectory inference
职业:粒子跟踪和轨迹推断的统计基础
  • 批准号:
    2339829
  • 财政年份:
    2024
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Continuing Grant
CAREER: Statistical Inference in Observational Studies -- Theory, Methods, and Beyond
职业:观察研究中的统计推断——理论、方法及其他
  • 批准号:
    2338760
  • 财政年份:
    2024
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Continuing Grant
STATISTICAL AND COMPUTATIONAL THRESHOLDS IN SPIN GLASSES AND GRAPH INFERENCE PROBLEMS
自旋玻璃和图推理问题的统计和计算阈值
  • 批准号:
    2347177
  • 财政年份:
    2024
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Standard Grant
Collaborative Research: Urban Vector-Borne Disease Transmission Demands Advances in Spatiotemporal Statistical Inference
合作研究:城市媒介传播疾病传播需要时空统计推断的进步
  • 批准号:
    2414688
  • 财政年份:
    2024
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Continuing Grant
CAREER: Distribution-Free and Adaptive Statistical Inference
职业:无分布和自适应统计推断
  • 批准号:
    2338464
  • 财政年份:
    2024
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Continuing Grant
Poly-Matching Causal Inference for Assessing Multiple Acute Medical Managements of Pediatric Traumatic Brain Injuries
用于评估小儿创伤性脑损伤的多种急性医疗治疗的多重匹配因果推理
  • 批准号:
    10586785
  • 财政年份:
    2023
  • 资助金额:
    $ 11.84万
  • 项目类别:
CAREER: Statistical Inference in High Dimensions using Variational Approximations
职业:使用变分近似进行高维统计推断
  • 批准号:
    2239234
  • 财政年份:
    2023
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Continuing Grant
CAREER: Towards Tight Guarantees of Markov Chain Sampling Algorithms in High Dimensional Statistical Inference
职业:高维统计推断中马尔可夫链采样算法的严格保证
  • 批准号:
    2237322
  • 财政年份:
    2023
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Continuing Grant
Unravel machine learning blackboxes -- A general, effective and performance-guaranteed statistical framework for complex and irregular inference problems in data science
揭开机器学习黑匣子——针对数据科学中复杂和不规则推理问题的通用、有效和性能有保证的统计框架
  • 批准号:
    2311064
  • 财政年份:
    2023
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Standard Grant
Development of statistical inference of extended Hawkes processes including missing data problem
扩展霍克斯过程的统计推断的发展,包括缺失数据问题
  • 批准号:
    23H03358
  • 财政年份:
    2023
  • 资助金额:
    $ 11.84万
  • 项目类别:
    Grant-in-Aid for Scientific Research (B)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了