Distribution of Patterns and Statistics in Random Sequences

随机序列中的模式和统计分布

基本信息

  • 批准号:
    1107084
  • 负责人:
  • 金额:
    $ 10万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2011
  • 资助国家:
    美国
  • 起止时间:
    2011-08-01 至 2015-09-30
  • 项目状态:
    已结题

项目摘要

This project targets the development and application of tools for efficient computation of distributions associated with patterns and statistics in random sequences, both realizations of observation sequences as well as hidden state sequences conditional on observed data. Due to the massive size of data sets that are available, computational methods that are not only accurate but also efficient are important. The proposed research seeks to contribute in this area. Minimal deterministic finite automata, probability generating functions, the sum-product algorithm and matrix-vector updates are the primary tools used to form algorithms for efficiently computing distributions of patterns and statistics. The goals of this research are threefold: (i) to further develop efficient methods for quantifying uncertainty in statistics of hidden state sequences of probabilistic graphical models; (ii) to efficiently compute exact probability distributions of complex patterns that have not previously been computed; (iii) to apply the probabilistic tools of (i) and (ii) to statistical tests and data analysis. The quantification of uncertainty in statistics of hidden states is frequently dealt with by determining the state sequence that is optimal for a particular criterion, with the statistic then evaluated from the optimal state sequence in a deterministic fashion. However, that approach does not account for uncertainty in the states. An alternate approach is to sample from the conditional distribution of states given the observed data, and then approximate the distribution of the statistic empirically. However, many samples are needed so that the approximation is accurate, leading to problems with scalability. We give a way to compute exact distributions in an efficient manner. The goals of the project are integrated, in that distributions of patterns and statistics are needed for statistical inference in applications, and in turn those applications drive the need for computing distributions in increasingly complex situations. Objectives of the work include computing a model-based distribution of prediction error rates for protein-protein interactions, computing exact distributions of coverage of spaced seeds for homology searches in DNA sequences, and computing the exact distribution of the one-dimensional scan statistic for multi-state higher-order Markovian trials. The need for distributional properties associated with patterns and statistics in random sequences arises in many practical fields of study with massive data sets, such as bioinformatics, time series, information theory, economics, data mining, and quality control. In this research computational tools are developed for computing such distributions. Results for distributions of patterns and statistics may be applied to many practical problems, such as detecting genes, promoters, or other functionally significant patterns in DNA sequences, determining probabilities related to classifications of observations in health-related studies, change points that indicate new regimes in economic data, patterns that indicate an intrusion in a computer system, or patterns associated with surveillance work. The theory will be used to compute distributions of statistics that are intractable by combinatorial or other means, and to provide exact probabilities in an efficient manner in situations that are typically handled by simulation of many data sets. Thus this research facilitates new scientific studies that rely on results for distributions associated with patterns or statistics that have not been computed to date.
该项目的目标是开发和应用工具,用于有效计算与随机序列中的模式和统计相关的分布,实现观察序列以及以观察数据为条件的隐藏状态序列。 由于可用的数据集规模庞大,不仅准确而且高效的计算方法非常重要。 拟议的研究旨在为这一领域作出贡献。 最小确定性有限自动机,概率生成函数,和积算法和矩阵向量更新是用于形成有效计算模式和统计分布的算法的主要工具。 本研究的目标有三个方面:(i)进一步开发有效的方法来量化概率图模型的隐藏状态序列的统计不确定性;(ii)有效地计算以前没有计算过的复杂模式的精确概率分布;(iii)将(i)和(ii)的概率工具应用于统计测试和数据分析。 隐藏状态统计数据中的不确定性的量化通常通过确定对于特定标准最佳的状态序列来处理,然后以确定性方式从最佳状态序列中评估统计数据。 然而,这种方法并没有考虑到各州的不确定性。 另一种方法是从给定观测数据的状态的条件分布中采样,然后根据经验近似统计量的分布。 然而,需要许多样本以使近似是准确的,从而导致可扩展性的问题。 我们给出了一种方法来计算精确的分布在一个有效的方式。 该项目的目标是集成的,因为应用程序中的统计推断需要模式和统计数据的分布,反过来这些应用程序又推动了在日益复杂的情况下计算分布的需求。 目标的工作包括计算一个基于模型的分布预测错误率的蛋白质-蛋白质相互作用,计算精确分布的覆盖间隔种子同源搜索的DNA序列,并计算精确分布的一维扫描统计的多态高阶马尔可夫试验。 随机序列中与模式和统计相关的分布特性的需求出现在许多具有大量数据集的实际研究领域,例如生物信息学,时间序列,信息论,经济学,数据挖掘和质量控制。 在这项研究中,计算工具的开发计算这样的分布。 模式和统计分布的结果可以应用于许多实际问题,例如检测基因、启动子或DNA序列中其他功能重要的模式,确定与健康相关研究中观察结果分类相关的概率,指示经济数据中新制度的变化点,指示计算机系统中入侵的模式,或与监视工作相关的模式。 该理论将被用来计算统计分布是棘手的组合或其他手段,并提供准确的概率在一个有效的方式,通常是由许多数据集的模拟处理的情况下。 因此,这项研究促进了新的科学研究,这些研究依赖于与迄今尚未计算的模式或统计数据相关的分布结果。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Donald Martin其他文献

The enigma of the SARS-CoV-2 microcirculation dysfunction: evidence for modified endothelial junctions
SARS-CoV-2 微循环功能障碍之谜:内皮连接改变的证据
  • DOI:
    10.1101/2023.04.24.538100
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Laurence Bouillet;Meryem Benmarce;Chloe Guerin;Laura Bouvet;Olivia Garnier;Donald Martin;Isabelle Vilgrain
  • 通讯作者:
    Isabelle Vilgrain
Photocoagulation treatment of proliferative diabetic retinopathy: the second report of diabetic retinopathy study findings.
增殖性糖尿病视网膜病变的光凝治疗:糖尿病视网膜病变研究结果的第二份报告。
  • DOI:
    10.1016/s0161-6420(78)35693-1
  • 发表时间:
    1978
  • 期刊:
  • 影响因子:
    13.7
  • 作者:
    A. Patz;S. Fine;D. Finkelstein;T. Prout;L. Aiello;R. Bradley;J. C. Briones;F. Myers;G. Bresnick;G. D. Venecia;T. Stevens;I. Wallow;S. Chandra;E. Norton;G. Blankenship;J. E. Harris;W. Knobloch;F. Goetz;R. Ramsay;J. Mcmeel;Donald Martin;M. Goldberg;F. Huamonte;G. Peyman;B. Straatsma;S. Kopelow;W. Heuven;A. Kassoff;S. Feman;R. Watzke;J. H. Mensher;W. Tasman;W. Annesley;B. Leonard;C. Canny;Leonard Joffe;T. R. Pheasant;F. Riekhof;M. Dahl;W. Bohart;D. Clarke;J. Berrocal;A. Ramos;G. Velázquez;R. Margherio;Delbert Nachazel;E. McLean;S. Guzak;G. Knatterud;C. Klimt;A. Hillis;D. Makuc;M. Davis;A. MacCormick;Y. Magli;Paul Segal;A. Lilienfeld;E. J. Ballintine;J. Cornfield;E. Friedman;Max Miller;M. Sears;E. P. Wiesner;F. Ederer;F. Ferris;B. Becker;Gl Johnston;C. Meinert;J. Bearman;S. Chandra;L. Rand
  • 通讯作者:
    L. Rand
Rheologic reflection in hypertriglyceridemia-induced pancreatitis.
高甘油三酯血症诱发的胰腺炎的流变学反射。
  • DOI:
    10.1097/smj.0b013e3181b4bdde
  • 发表时间:
    2009
  • 期刊:
  • 影响因子:
    1.1
  • 作者:
    Donald Martin;E. McCann;P. Glynn
  • 通讯作者:
    P. Glynn
A study of Functional ion Transport Using Tethered Membranes
  • DOI:
    10.1016/j.bpj.2010.12.2050
  • 发表时间:
    2011-02-02
  • 期刊:
  • 影响因子:
  • 作者:
    Bruce A. Cornell;Andrew Battle;Louise Brown;Sonia Carnie;Sophia C. Goodchild;Hedayetul Islam;Donald Martin;Boris Martinac;Russell Richards;Stella Valenzuela
  • 通讯作者:
    Stella Valenzuela
Invitation for Nominations for 1982 American Institute of Nutrition Awards
  • DOI:
    10.1093/jn/111.8.1507
  • 发表时间:
    1981-08-01
  • 期刊:
  • 影响因子:
  • 作者:
    Joan Martin;Robert Labbe;Donald Martin;Patti Shores
  • 通讯作者:
    Patti Shores

Donald Martin的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Donald Martin', 18)}}的其他基金

Statistical Analysis of Categorical Time Series through Sparse Markov Models
通过稀疏马尔可夫模型对分类时间序列进行统计分析
  • 批准号:
    1811933
  • 财政年份:
    2018
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
Distributions of patterns and statistics in Markovian sequences
马尔可夫序列中的模式和统计分布
  • 批准号:
    0805577
  • 财政年份:
    2008
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
Urban Systemic Program in Science, Mathematics, and Technology Education (USP): SciMaX
科学、数学和技术教育城市系统计划 (USP):SciMaX
  • 批准号:
    0114949
  • 财政年份:
    2001
  • 资助金额:
    $ 10万
  • 项目类别:
    Cooperative Agreement
CPMSA: "Comprehensive Partnerships for Minority Student Achievement"
CPMSA:“少数民族学生成就的全面伙伴关系”
  • 批准号:
    9550622
  • 财政年份:
    1995
  • 资助金额:
    $ 10万
  • 项目类别:
    Cooperative Agreement
Mathematical Sciences: Recursion Theory and Set Theory
数学科学:递归理论和集合论
  • 批准号:
    9505153
  • 财政年份:
    1995
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant
Mathematical Sciences: Recursion Theory and Set Theory
数学科学:递归理论和集合论
  • 批准号:
    9206946
  • 财政年份:
    1992
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant
Mathematical Sciences: Recursion Theory and Set Theory
数学科学:递归理论和集合论
  • 批准号:
    8902555
  • 财政年份:
    1989
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant
Mini-Computer Applications to Undergraduate Meteorology Instruction
微型计算机在本科气象学教学中的应用
  • 批准号:
    7813134
  • 财政年份:
    1978
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant

相似海外基金

Spatiotemporal dynamics of acetylcholine activity in adaptive behaviors and response patterns
适应性行为和反应模式中乙酰胆碱活性的时空动态
  • 批准号:
    24K10485
  • 财政年份:
    2024
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Collaborative Research: Unraveling the phylogenetic and evolutionary patterns of fragmented mitochondrial genomes in parasitic lice
合作研究:揭示寄生虱线粒体基因组片段的系统发育和进化模式
  • 批准号:
    2328117
  • 财政年份:
    2024
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
Uncovering the evolutionary patterns of the Aculeata stinger
揭示 Aculeata 毒刺的进化模式
  • 批准号:
    24K18174
  • 财政年份:
    2024
  • 资助金额:
    $ 10万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Illuminating patterns and processes of water quality in U.S. rivers using physics-guided deep learning
使用物理引导的深度学习阐明美国河流的水质模式和过程
  • 批准号:
    2346471
  • 财政年份:
    2024
  • 资助金额:
    $ 10万
  • 项目类别:
    Continuing Grant
Collaborative Research: Can Irregular Structural Patterns Beat Perfect Lattices? Biomimicry for Optimal Acoustic Absorption
合作研究:不规则结构模式能否击败完美晶格?
  • 批准号:
    2341950
  • 财政年份:
    2024
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
Collaborative Research: Unraveling the phylogenetic and evolutionary patterns of fragmented mitochondrial genomes in parasitic lice
合作研究:揭示寄生虱线粒体基因组片段的系统发育和进化模式
  • 批准号:
    2328119
  • 财政年份:
    2024
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
Effects of Environmental Change on Microbial Self-organized Patterns in Antarctic Lakes
环境变化对南极湖泊微生物自组织模式的影响
  • 批准号:
    2333917
  • 财政年份:
    2024
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
AGS-PRF: Understanding Historical Trends in Tropical Pacific Sea Surface Temperature Patterns
AGS-PRF:了解热带太平洋海面温度模式的历史趋势
  • 批准号:
    2317224
  • 财政年份:
    2024
  • 资助金额:
    $ 10万
  • 项目类别:
    Fellowship Award
Asymptotic patterns and singular limits in nonlinear evolution problems
非线性演化问题中的渐近模式和奇异极限
  • 批准号:
    EP/Z000394/1
  • 财政年份:
    2024
  • 资助金额:
    $ 10万
  • 项目类别:
    Research Grant
Collaborative Research: Linking carbon preferences and competition to predict and test patterns of functional diversity in soil microbial communities
合作研究:将碳偏好和竞争联系起来,预测和测试土壤微生物群落功能多样性的模式
  • 批准号:
    2312302
  • 财政年份:
    2024
  • 资助金额:
    $ 10万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了