Non-uniform sampling of permutations and large scale hypothesis testing

排列的非均匀采样和大规模假设检验

基本信息

  • 批准号:
    1521145
  • 负责人:
  • 金额:
    $ 39.97万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2015
  • 资助国家:
    美国
  • 起止时间:
    2015-08-01 至 2019-07-31
  • 项目状态:
    已结题

项目摘要

Modern scientific tools are delivering very large data sets. This is especially true in biology where expression levels for thousands of genes or even the specific DNA information at millions of locations on the genome can be measured. Scientists would like to correlate these variables with other measured quantities, especially the presence or absence of a disease. When millions of hypotheses are investigated, it is possible that one of them will correlate with some genes just by chance. It is common to insist that the observed correlation for one test be so strong that it would happen by chance at most once in 20 million tries. The usual way to measure chance correlations is to shuffle the data at random and see how often a strong effect appears. If the event of interest is a one in 20 million outcome we usually need about ten times that many random shuffles to be sure. This proposal is about finding more efficient random shuffling strategies to get a desired answer with fewer shuffles. The goal is to find important biological variables with much less computation and greater reliability. Finding the important genes is a first step for followup work that includes mining the literature and running experiments to understand the role of those genes and determine whether their relationship is useful or not. Part of the work will also involve adjusting for other factors measured or otherwise that could make the observed correlations misleading. New mathematical methods for finding and measuring rare and unusual outcomes can also be used in industrial problems where the rare phenomenon is an unusually effective product design as measured by computer simulations.The usual way to test whether a gene or a gene set is associated with a phenotype (disease, height, etc.) or a treatment (diet, medicines, etc.) is to run a permutation test. From n data points, there are as many as n! permutations to run. Usually this amount of permutations is beyond our budget and we sample from the permutations as well. If we compute our test statistic M times, once on the original data and once for each of M-1 permutations, then the smallest p value we can possibly get is 1/M. That is, to attain a target p value we have to compute our statistic at least 1/p times. The standard threshold for genome wide association studies translates into a bare minimum of 20,000,000 computations. To have adequate power in a permutation test requires more like 10/p computations. When the phenotype/treatment is binary, the permutation test reduces to sampling with replacement. This project uses non-uniform sampling of permutations or combinations. The main method is importance sampling from mixtures of proposals using the mixture component probabilities as control variates. Markov chain Monte Carlo methods will be investigated.
现代科学工具正在提供非常大的数据集。在生物学中尤其如此,可以测量数千个基因的表达水平,甚至基因组上数百万个位置的特定DNA信息。科学家们希望将这些变量与其他测量的量相关联,特别是疾病的存在或不存在。当数以百万计的假设被调查时,其中一个可能只是偶然地与某些基因相关。人们通常坚持认为,一个测试所观察到的相关性是如此之强,以至于它在2000万次尝试中最多会偶然发生一次。测量机会相关性的常用方法是随机地对数据进行洗牌,看看强效应出现的频率。如果我们感兴趣的事件是2000万分之一的结果,我们通常需要大约10倍的随机洗牌来确定。这个建议是关于寻找更有效的随机洗牌策略,以得到一个所需的答案与更少的洗牌。目标是以更少的计算和更高的可靠性找到重要的生物变量。找到重要的基因是后续工作的第一步,包括挖掘文献和进行实验,以了解这些基因的作用,并确定它们的关系是否有用。这项工作的一部分还将涉及调整其他因素的测量或其他可能使观察到的相关性误导。发现和测量罕见和不寻常结果的新数学方法也可以用于工业问题,其中罕见现象是通过计算机模拟测量的异常有效的产品设计。测试基因或基因集是否与表型(疾病,身高等)相关的常用方法或治疗(饮食、药物等)就是进行排列测试从n个数据点,有n个之多!排列运行。通常这种排列数量超出了我们的预算,我们也从排列中采样。如果我们计算检验统计量M次,一次是在原始数据上,一次是在M-1个排列中,那么我们可能得到的最小p值是1/M。 也就是说,为了获得目标p值,我们必须至少计算1/p次统计量。全基因组关联研究的标准阈值转化为最少20,000,000次计算。为了在置换测试中具有足够的功率,需要更像10/p的计算。当表型/治疗是二元的时,排列检验简化为带替换的采样。这个项目使用排列或组合的非均匀采样。主要的方法是重要性抽样的混合物的建议,使用的混合成分的概率作为控制变量。马尔可夫链蒙特卡罗方法将进行研究。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Art Owen其他文献

Art Owen的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Art Owen', 18)}}的其他基金

Randomized quasi-Monte Carlo sampling for scientific computing
用于科学计算的随机准蒙特卡洛采样
  • 批准号:
    2152780
  • 财政年份:
    2022
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
BIGDATA: F: Computationally Efficient Algorithms for Large-Scale Crossed Random Effects Models
BIGDATA:F:大规模交叉随机效应模型的计算高效算法
  • 批准号:
    1837931
  • 财政年份:
    2018
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
Monte Carlo and Quasi-Monte Carlo Methods for Statistics
蒙特卡罗和准蒙特卡罗统计方法
  • 批准号:
    1407397
  • 财政年份:
    2014
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Continuing Grant
MCQMC 2014 Travel Support
MCQMC 2014 旅行支持
  • 批准号:
    1357690
  • 财政年份:
    2014
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
MCQMC 2012
2012年MCQMC
  • 批准号:
    1135257
  • 财政年份:
    2011
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
Monte Carlo and Quasi-Monte Carlo Methods for Statistics
蒙特卡罗和准蒙特卡罗统计方法
  • 批准号:
    0906056
  • 财政年份:
    2009
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Continuing Grant
Travel support for MCQMC July 2008, Montreal, Canada
为 MCQMC 提供差旅支持,2008 年 7 月,加拿大蒙特利尔
  • 批准号:
    0805890
  • 财政年份:
    2008
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
Monte Carlo and Quasi-Monte Carlo Methods for Statistics
蒙特卡罗和准蒙特卡罗统计方法
  • 批准号:
    0604939
  • 财政年份:
    2006
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Continuing Grant
Statistical Integration and Approximation
统计积分和近似
  • 批准号:
    0306612
  • 财政年份:
    2003
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Continuing Grant
Statistical Numerics
统计数值
  • 批准号:
    0072445
  • 财政年份:
    2000
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Continuing Grant

相似国自然基金

基于Riemann-Hilbert方法的相关问题研究
  • 批准号:
    11026205
  • 批准年份:
    2010
  • 资助金额:
    3.0 万元
  • 项目类别:
    数学天元基金项目
微分遍历理论和廖山涛的一些方法的应用
  • 批准号:
    10671006
  • 批准年份:
    2006
  • 资助金额:
    21.0 万元
  • 项目类别:
    面上项目

相似海外基金

Chinese language versions of the National Alzheimer's Coordinating Center's Uniform Data Set version 4: a linguistic and cultural adaptation study
国家阿尔茨海默病协调中心统一数据集第4版中文版:语言和文化适应研究
  • 批准号:
    10740587
  • 财政年份:
    2023
  • 资助金额:
    $ 39.97万
  • 项目类别:
A micro-dissection platform for generating uniform-sized patient-derived tumor organoids (PDOs) for personalized cancer therapy
一个显微解剖平台,用于生成大小一致的患者来源的肿瘤类器官(PDO),用于个性化癌症治疗
  • 批准号:
    10697348
  • 财政年份:
    2022
  • 资助金额:
    $ 39.97万
  • 项目类别:
Digital Anti-Aliasing FIR filter for Inverter-based Non-uniform Sampling Flash ADC in Neural Implant
用于神经植入中基于逆变器的非均匀采样闪存 ADC 的数字抗混叠 FIR 滤波器
  • 批准号:
    562074-2021
  • 财政年份:
    2021
  • 资助金额:
    $ 39.97万
  • 项目类别:
    University Undergraduate Student Research Awards
Evaluating Chinese-speaking community-dwelling elders in the United States with the Uniform Data Set
用统一数据集评估美国社区华语老年人
  • 批准号:
    10170514
  • 财政年份:
    2019
  • 资助金额:
    $ 39.97万
  • 项目类别:
EARS: Enabling Opportunistic Environmental Monitoring with Non-Uniform Sampling and Processing Circuits
EARS:通过非均匀采样和处理电路实现机会环境监测
  • 批准号:
    1643004
  • 财政年份:
    2016
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Standard Grant
UNIFORM NATURALISTIC TEENAGE DRIVING DATA SET-3
统一自然青少年驾驶数据集-3
  • 批准号:
    9356763
  • 财政年份:
    2016
  • 资助金额:
    $ 39.97万
  • 项目类别:
Non-Uniform Sampling法による未検出ピークの構造情報からの追跡
使用非均匀采样方法跟踪结构信息中未检测到的峰
  • 批准号:
    16H00313
  • 财政年份:
    2016
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Grant-in-Aid for Encouragement of Scientists
Development of safe non-uniform sampling processing method and its application into proteins
安全非均匀采样处理方法的开发及其在蛋白质中的应用
  • 批准号:
    22770106
  • 财政年份:
    2010
  • 资助金额:
    $ 39.97万
  • 项目类别:
    Grant-in-Aid for Young Scientists (B)
Uniform length DNA for paired end nextgen sequencing via in vitro packaging
通过体外包装实现配对末端下一代测序的均一长度 DNA
  • 批准号:
    8001230
  • 财政年份:
    2010
  • 资助金额:
    $ 39.97万
  • 项目类别:
Propagating the Uniform Research Integrity Climate Assessment (U-RICA)
推广统一研究诚信气候评估 (U-RICA)
  • 批准号:
    7540817
  • 财政年份:
    2008
  • 资助金额:
    $ 39.97万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了