Distributions of patterns and statistics in Markovian sequences

马尔可夫序列中的模式和统计分布

基本信息

  • 批准号:
    0805577
  • 负责人:
  • 金额:
    $ 15万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2008
  • 资助国家:
    美国
  • 起止时间:
    2008-07-01 至 2013-06-30
  • 项目状态:
    已结题

项目摘要

In this project the investigator studies the computation of distributions of patterns and statistics in sequences through auxiliary Markov chains. In the method, Markovian structure in the original sequence is exploited to associate an auxiliary Markov chain with the sequence in such a manner that an event of interest in the original sequence occurs if and only if the auxiliary Markov chain lies in a class of states that corresponds to the event. Once the auxiliary chain is set up, probabilities for the event may be computed by tracking movements through the chain and then extracting the desired probabilities. The goals of this work are threefold: (1) to compute distributions of complex patterns that have not been addressed to date; (2) to apply probabilistic tools that are developed to statistical testing and data analysis; and (3) to quantify uncertainty in statistics of labeled and segmented data modeled by probabilistic graphical models. These goals are integrated, since probabilistic approaches to computing distributions of patterns and statistics provide the mathematical tools necessary for the statistical applications of goals (2) and (3), and in turn those applications drive the need for computing distributions in increasingly complex situations. Whereas satisfying the first two goals will provide an important contribution to the literature, the major contribution of the research is represented by goal (3). The computation of sampling distributions of statistics of hidden state sequences provides a method of quantifying uncertainty in labeled and segmented data, an area that has not been adequately addressed. In cases where one is interested in inference on statistics of labeled data, a typical approach is to determine the most likely sequence of states given the observations, and then obtain the value of the statistic of interest from that state sequence. However, whereas the most likely states are optimal if one is interested in the best set of labels, it may not be so for inference on statistics of the labels. This work provides a novel approach to compute the exact sampling distribution of statistics of labeled data, providing a means for more accurate inference. Sensitivity of computed distributions to estimated parameters and applications to change points will also be considered. The need for distributional properties associated with patterns and statistics in sequences, both realizations of data emanating from a model and hidden sequences used to label and segment observed data, arises in many practical fields of study with massive data sets, such as bioinformatics, time series, information theory, economics, data mining, and quality control. In this research computational tools are developed for computing such distributions. Results for distributions of patterns and statistics may be applied to many practical problems, such as detecting genes, promoters, or other functionally significant patterns in DNA sequences, and determining probabilities related to classification of observations in health-related studies, change points that indicate new regimes in economic data, patterns that indicate an intrusion, or of patterns associated with surveillance work. The theory may be used to compute distributions of patterns in underlying sequences that are corrupted by noise or missing observations, and also distributions of statistics that are intractable by combinatorial or other means. Thus this research facilitates new scientific studies that rely on results for patterns or statistics that have not been computed to date.
在这个项目中,研究者通过辅助马尔可夫链来研究序列中模式和统计量的分布计算。 在该方法中,原始序列中的马尔可夫结构被利用来将辅助马尔可夫链与该序列相关联,使得当且仅当辅助马尔可夫链位于与该事件相对应的一类状态中时,原始序列中的感兴趣事件发生。 一旦建立了辅助链,就可以通过跟踪通过该链的移动并且然后提取期望的概率来计算事件的概率。 这项工作的目标有三个方面:(1)计算迄今尚未解决的复杂模式的分布;(2)将开发的概率工具应用于统计测试和数据分析;(3)量化由概率图形模型建模的标记和分段数据的统计数据中的不确定性。 这些目标是综合的,因为计算模式和统计分布的概率方法为目标(2)和(3)的统计应用提供了必要的数学工具,而这些应用又推动了在日益复杂的情况下计算分布的需求。 虽然满足前两个目标将为文献提供重要贡献,但研究的主要贡献由目标(3)表示。 隐藏状态序列的统计数据的采样分布的计算提供了一种方法,量化标记和分割的数据中的不确定性,尚未得到充分解决的领域。 在对标记数据的统计量的推断感兴趣的情况下,典型的方法是确定给定观测的最可能的状态序列,然后从该状态序列获得感兴趣的统计量的值。 然而,如果人们对标签的最佳集合感兴趣,则最可能的状态是最优的,但对于标签的统计信息的推断可能不是这样。 这项工作提供了一种新的方法来计算标记数据的统计量的精确抽样分布,提供了一种更准确的推理手段。 还将考虑计算分布对估计参数的敏感性和对变点的应用。 与序列中的模式和统计相关的分布特性的需求,无论是从模型中产生的数据的实现,还是用于标记和分割观察数据的隐藏序列,都出现在具有大量数据集的许多实际研究领域,例如生物信息学,时间序列,信息论,经济学,数据挖掘和质量控制。 在这项研究中,计算工具的开发计算这样的分布。 模式和统计分布的结果可以应用于许多实际问题,例如检测基因、启动子或DNA序列中的其他功能重要模式,以及确定与健康相关研究中的观察结果分类相关的概率、指示经济数据中新制度的变化点、指示入侵的模式或与监视工作相关的模式。 该理论可用于计算被噪声或缺失观测破坏的潜在序列中的模式分布,以及通过组合或其他手段难以处理的统计分布。 因此,这项研究促进了新的科学研究,这些研究依赖于迄今为止尚未计算的模式或统计数据的结果。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Donald Martin其他文献

The enigma of the SARS-CoV-2 microcirculation dysfunction: evidence for modified endothelial junctions
SARS-CoV-2 微循环功能障碍之谜:内皮连接改变的证据
  • DOI:
    10.1101/2023.04.24.538100
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Laurence Bouillet;Meryem Benmarce;Chloe Guerin;Laura Bouvet;Olivia Garnier;Donald Martin;Isabelle Vilgrain
  • 通讯作者:
    Isabelle Vilgrain
Photocoagulation treatment of proliferative diabetic retinopathy: the second report of diabetic retinopathy study findings.
增殖性糖尿病视网膜病变的光凝治疗:糖尿病视网膜病变研究结果的第二份报告。
  • DOI:
    10.1016/s0161-6420(78)35693-1
  • 发表时间:
    1978
  • 期刊:
  • 影响因子:
    13.7
  • 作者:
    A. Patz;S. Fine;D. Finkelstein;T. Prout;L. Aiello;R. Bradley;J. C. Briones;F. Myers;G. Bresnick;G. D. Venecia;T. Stevens;I. Wallow;S. Chandra;E. Norton;G. Blankenship;J. E. Harris;W. Knobloch;F. Goetz;R. Ramsay;J. Mcmeel;Donald Martin;M. Goldberg;F. Huamonte;G. Peyman;B. Straatsma;S. Kopelow;W. Heuven;A. Kassoff;S. Feman;R. Watzke;J. H. Mensher;W. Tasman;W. Annesley;B. Leonard;C. Canny;Leonard Joffe;T. R. Pheasant;F. Riekhof;M. Dahl;W. Bohart;D. Clarke;J. Berrocal;A. Ramos;G. Velázquez;R. Margherio;Delbert Nachazel;E. McLean;S. Guzak;G. Knatterud;C. Klimt;A. Hillis;D. Makuc;M. Davis;A. MacCormick;Y. Magli;Paul Segal;A. Lilienfeld;E. J. Ballintine;J. Cornfield;E. Friedman;Max Miller;M. Sears;E. P. Wiesner;F. Ederer;F. Ferris;B. Becker;Gl Johnston;C. Meinert;J. Bearman;S. Chandra;L. Rand
  • 通讯作者:
    L. Rand
Rheologic reflection in hypertriglyceridemia-induced pancreatitis.
高甘油三酯血症诱发的胰腺炎的流变学反射。
  • DOI:
    10.1097/smj.0b013e3181b4bdde
  • 发表时间:
    2009
  • 期刊:
  • 影响因子:
    1.1
  • 作者:
    Donald Martin;E. McCann;P. Glynn
  • 通讯作者:
    P. Glynn
A study of Functional ion Transport Using Tethered Membranes
  • DOI:
    10.1016/j.bpj.2010.12.2050
  • 发表时间:
    2011-02-02
  • 期刊:
  • 影响因子:
  • 作者:
    Bruce A. Cornell;Andrew Battle;Louise Brown;Sonia Carnie;Sophia C. Goodchild;Hedayetul Islam;Donald Martin;Boris Martinac;Russell Richards;Stella Valenzuela
  • 通讯作者:
    Stella Valenzuela
Invitation for Nominations for 1982 American Institute of Nutrition Awards
  • DOI:
    10.1093/jn/111.8.1507
  • 发表时间:
    1981-08-01
  • 期刊:
  • 影响因子:
  • 作者:
    Joan Martin;Robert Labbe;Donald Martin;Patti Shores
  • 通讯作者:
    Patti Shores

Donald Martin的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Donald Martin', 18)}}的其他基金

Statistical Analysis of Categorical Time Series through Sparse Markov Models
通过稀疏马尔可夫模型对分类时间序列进行统计分析
  • 批准号:
    1811933
  • 财政年份:
    2018
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Distribution of Patterns and Statistics in Random Sequences
随机序列中的模式和统计分布
  • 批准号:
    1107084
  • 财政年份:
    2011
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Urban Systemic Program in Science, Mathematics, and Technology Education (USP): SciMaX
科学、数学和技术教育城市系统计划 (USP):SciMaX
  • 批准号:
    0114949
  • 财政年份:
    2001
  • 资助金额:
    $ 15万
  • 项目类别:
    Cooperative Agreement
CPMSA: "Comprehensive Partnerships for Minority Student Achievement"
CPMSA:“少数民族学生成就的全面伙伴关系”
  • 批准号:
    9550622
  • 财政年份:
    1995
  • 资助金额:
    $ 15万
  • 项目类别:
    Cooperative Agreement
Mathematical Sciences: Recursion Theory and Set Theory
数学科学:递归理论和集合论
  • 批准号:
    9505153
  • 财政年份:
    1995
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
Mathematical Sciences: Recursion Theory and Set Theory
数学科学:递归理论和集合论
  • 批准号:
    9206946
  • 财政年份:
    1992
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
Mathematical Sciences: Recursion Theory and Set Theory
数学科学:递归理论和集合论
  • 批准号:
    8902555
  • 财政年份:
    1989
  • 资助金额:
    $ 15万
  • 项目类别:
    Continuing Grant
Mini-Computer Applications to Undergraduate Meteorology Instruction
微型计算机在本科气象学教学中的应用
  • 批准号:
    7813134
  • 财政年份:
    1978
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant

相似海外基金

Rapid measurement of novel harm reduction housing on HIV risk, treatment uptake, drug use and supply
快速测量新型减害住房对艾滋病毒风险、治疗接受情况、毒品使用和供应的影响
  • 批准号:
    10701309
  • 财政年份:
    2023
  • 资助金额:
    $ 15万
  • 项目类别:
Examining patterns of opioid overdose hotspots and opioid treatment deserts in California
检查加利福尼亚州阿片类药物过量热点和阿片类药物治疗沙漠的模式
  • 批准号:
    10679608
  • 财政年份:
    2023
  • 资助金额:
    $ 15万
  • 项目类别:
Examining Early Life Risk Factors and Patterns of Screening for Early-Onset Colorectal Cancer
检查早期生命危险因素和早发性结直肠癌筛查模式
  • 批准号:
    10680160
  • 财政年份:
    2023
  • 资助金额:
    $ 15万
  • 项目类别:
Effects of Dietary Patterns and Sodium Intake on the Gut Microbiome and Metabolome
饮食模式和钠摄入量对肠道微生物组和代谢组的影响
  • 批准号:
    10888821
  • 财政年份:
    2023
  • 资助金额:
    $ 15万
  • 项目类别:
Does NHANES underestimate true population-based exposures to pesticides? Exploring bias in NHANES human biomonitoring data.
NHANES 是否低估了基于人群的农药真实暴露量?
  • 批准号:
    10648836
  • 财政年份:
    2023
  • 资助金额:
    $ 15万
  • 项目类别:
Project 2: Causal Relationship Disentangler for Precision Nutrition
项目2:精准营养的因果关系解开器
  • 批准号:
    10386500
  • 财政年份:
    2022
  • 资助金额:
    $ 15万
  • 项目类别:
Project 2: Causal Relationship Disentangler for Precision Nutrition
项目2:精准营养的因果关系解开器
  • 批准号:
    10552678
  • 财政年份:
    2022
  • 资助金额:
    $ 15万
  • 项目类别:
Pursuing patterns in the statistics of utility data to analyze grid resilience
追踪公用事业数据统计模式以分析电网弹性
  • 批准号:
    2153163
  • 财政年份:
    2022
  • 资助金额:
    $ 15万
  • 项目类别:
    Standard Grant
Patterns of Prosecution: Suspects and Victims of Violent Crime in Historic and Contemporary Statistics
起诉模式:历史和当代统计中的暴力犯罪嫌疑人和受害者
  • 批准号:
    ES/X007251/1
  • 财政年份:
    2022
  • 资助金额:
    $ 15万
  • 项目类别:
    Fellowship
Fast Radio Burst statistics in time and space: FRB repetition patterns and the Milky Way's plasma probed by CHIME/FRB
时间和空间上的快速射电爆发统计:FRB 重复模式和 CHIME/FRB 探测到的银河系等离子体
  • 批准号:
    569102-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 15万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了