Computationally efficient estimation of the error rates of hidden Markov model results
隐马尔可夫模型结果错误率的计算有效估计
基本信息
- 批准号:0914739
- 负责人:
- 金额:$ 30万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2009
- 资助国家:美国
- 起止时间:2009-08-01 至 2013-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
NSF proposal: 0914739PI: Newberg, Lee A.Computationally efficient estimation of the error rates of hidden Markov model resultsHidden Markov models are employed in a wide variety of fields, including speech recognition, econometrics, computer vision, signal processing, cryptanalysis, and computational biology. In speech recognition, hidden Markov models can be used to distinguish one word from another based upon the time series of certain qualities of a sound. In finance, the models can be used to simulate the unknown transitions between low, medium, and high debt default regimes in time. In computer vision they can be used to decode American Sign Language (ASL). Hidden Markov models are used in computational biology to find similarity between sequences of nucleotides (DNA or RNA) or polypeptides (proteins) and to predict protein structure.Hidden Markov models are employed because they permit the facile description and implementation of powerful statistical models and algorithms that are used to score a match possibility in sequence data. Perhaps the most common use of hidden Markov models is for the purpose of hypothesis testing or classification. For instance, a speech-recognition model may be used to quantify the belief that a recorded message contains the word ?elephant.? However, once a score for a belief has been computed, the question is how to interpret that value.1. Is the score strong enough to indicate a signal, or is it reasonably probable that noise will yield a score this strong?2. Is the score weak enough to indicate noise, or is it reasonably probable that a signal will yield a score this weak?The false positive rate (closely related to the type I error or p-value) for a score threshold is the probability that noise data will yield a score at least as strong as the threshold. The false negative rate for a score threshold is the probability that signal data will fail to score at least as strong as the threshold.In 2008, Newberg designed a method for estimating error rates that is more efficient than other approaches that are applicable to general hidden Markov models. However the approach is still too slow for computationally intensive applications such as repeated searches of large DNA databases. This proposed research aims to speed the estimation primarily via two approaches: (1) the creative re-use of simulations, and (2) statistically robust elimination of outlier simulation results.The proposed research is significant because the facile availability of error rates permits researchers in a wide variety of scientific fields to evaluate the statistical significance of their conclusions and the power of their hypothesis tests. Once the technique and software are available, researchers in speech recognition or ASL recognition will be able to use rigorously derived statistical significance values to set their hypothesis test thresholds for word recognition. Financial modelers will have a rigorous standard by which to evaluate their market timing models. Computational biologists will have rigorous statistical significance values for their sequence alignments and for their pattern scans of large sequence databases. More generally, the availability of error rates for hidden Markov model results will significantly enhance the attractiveness of hidden Markov models for use in fields where hidden Markov models are not currently employed.
NSF提案:0914739 PI:Newberg,Lee A.隐马尔可夫模型结果的错误率的计算有效估计隐马尔可夫模型被广泛应用于各种领域,包括语音识别,计量经济学,计算机视觉,信号处理,密码分析和计算生物学。 在语音识别中,隐马尔可夫模型可以用来根据声音的某些质量的时间序列来区分一个单词。 在金融领域,这些模型可用于模拟低、中、高债务违约制度之间的未知过渡。 在计算机视觉中,它们可以用来解码美国手语(ASL)。 隐马尔可夫模型(Hidden Markov Model)是计算生物学中的一种模型,用于寻找核苷酸(DNA或RNA)或多肽(蛋白质)序列之间的相似性,并预测蛋白质结构。隐马尔可夫模型之所以被采用,是因为它允许简单地描述和实现强大的统计模型和算法,这些模型和算法用于对序列数据中的匹配可能性进行评分。 也许隐马尔可夫模型最常见的用途是用于假设检验或分类。 例如,语音识别模型可以用于量化所记录的消息包含单词?大象? 然而,一旦计算出信念的分数,问题就在于如何解释该值。1.分数是否足够强以指示信号,或者噪声是否有合理的可能产生如此强的分数?2.分数是否弱到足以表明噪声,或者信号产生如此弱的分数的可能性是否合理?评分阈值的假阳性率(与第一类错误或p值密切相关)是噪音数据产生至少与阈值一样强的评分的概率。 分数阈值的假阴性率是信号数据将无法得分至少与阈值一样强的概率。2008年,Newberg设计了一种估计错误率的方法,该方法比适用于一般隐马尔可夫模型的其他方法更有效。 然而,这种方法对于计算密集型应用(例如重复搜索大型DNA数据库)来说仍然太慢。 本研究主要通过两种方法来加快估计速度:(1)创造性地重复使用模拟,(2)在统计上稳健地消除离群模拟结果。本研究具有重要意义,因为错误率的简单可用性使各种科学领域的研究人员能够评估其结论的统计意义和假设检验的效力。一旦技术和软件可用,语音识别或ASL识别的研究人员将能够使用严格推导的统计显著性值来设置单词识别的假设检验阈值。金融建模者将有一个严格的标准来评估他们的市场时机模型。计算生物学家对于他们的序列比对和大型序列数据库的模式扫描将具有严格的统计显著性值。更一般地,隐马尔可夫模型结果的误差率的可用性将显著增强隐马尔可夫模型在当前未采用隐马尔可夫模型的领域中使用的吸引力。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Patrick Van Roey其他文献
Patrick Van Roey的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Patrick Van Roey', 18)}}的其他基金
NANOSCALE: An Electronic Component from Protein Self-Assembly
纳米尺度:蛋白质自组装的电子元件
- 批准号:
9986431 - 财政年份:2000
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
相似国自然基金
固定参数可解算法在平面图问题的应用以及和整数线性规划的关系
- 批准号:60973026
- 批准年份:2009
- 资助金额:32.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: Inference for Network Models with Covariates: Leveraging Local Information for Statistically and Computationally Efficient Estimation of Global Parameters
协作研究:具有协变量的网络模型的推理:利用局部信息对全局参数进行统计和计算上的高效估计
- 批准号:
1713082 - 财政年份:2017
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Collaborative Research: Inference for Network Models with Covariates: Leveraging Local Information for Statistically and Computationally Efficient Estimation of Global Parameters
协作研究:具有协变量的网络模型的推理:利用局部信息对全局参数进行统计和计算上的高效估计
- 批准号:
1713083 - 财政年份:2017
- 资助金额:
$ 30万 - 项目类别:
Standard Grant
Computationally Efficient Adaptive Spline Filters for Nonlinear State Estimation
用于非线性状态估计的计算高效的自适应样条滤波器
- 批准号:
250256-2012 - 财政年份:2016
- 资助金额:
$ 30万 - 项目类别:
Discovery Grants Program - Individual
Computationally Efficient Adaptive Spline Filters for Nonlinear State Estimation
用于非线性状态估计的计算高效的自适应样条滤波器
- 批准号:
250256-2012 - 财政年份:2015
- 资助金额:
$ 30万 - 项目类别:
Discovery Grants Program - Individual
Computationally Efficient Adaptive Spline Filters for Nonlinear State Estimation
用于非线性状态估计的计算高效的自适应样条滤波器
- 批准号:
429294-2012 - 财政年份:2014
- 资助金额:
$ 30万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Computationally Efficient Adaptive Spline Filters for Nonlinear State Estimation
用于非线性状态估计的计算高效的自适应样条滤波器
- 批准号:
250256-2012 - 财政年份:2014
- 资助金额:
$ 30万 - 项目类别:
Discovery Grants Program - Individual
Computationally Efficient Adaptive Spline Filters for Nonlinear State Estimation
用于非线性状态估计的计算高效的自适应样条滤波器
- 批准号:
429294-2012 - 财政年份:2013
- 资助金额:
$ 30万 - 项目类别:
Discovery Grants Program - Accelerator Supplements
Computationally Efficient Adaptive Spline Filters for Nonlinear State Estimation
用于非线性状态估计的计算高效的自适应样条滤波器
- 批准号:
250256-2012 - 财政年份:2013
- 资助金额:
$ 30万 - 项目类别:
Discovery Grants Program - Individual
Computationally Efficient Adaptive Spline Filters for Nonlinear State Estimation
用于非线性状态估计的计算高效的自适应样条滤波器
- 批准号:
250256-2012 - 财政年份:2012
- 资助金额:
$ 30万 - 项目类别:
Discovery Grants Program - Individual
Computationally Efficient Adaptive Spline Filters for Nonlinear State Estimation
用于非线性状态估计的计算高效的自适应样条滤波器
- 批准号:
429294-2012 - 财政年份:2012
- 资助金额:
$ 30万 - 项目类别:
Discovery Grants Program - Accelerator Supplements














{{item.name}}会员




